Shards and replicas in Elasticsearch
When you download elasticsearch and start it up you create an elasticsearch node which tries to join an existing cluster if available or creates a new one. Let's say you created your own new cluster with a single node, the one that you just started up. We have no data, therefore we need to create an index.
When you create an index (an index is automatically created when you index the first document as well) you can define how many shards it will be composed of. If you don't specify a number it will have the default number of shards: 5 primaries. What does it mean?
It means that elasticsearch will create 5 primary shards that will contain your data:
____ ____ ____ ____ ____
|1||2||3||4||5||____||____||____||____||____|
Every time you index a document elasticsearch will decide which primary shard is supposed to hold that document and will index it there. Primary shards are not copy of the data, they are the data! With a single node of course multiple shards don't make much sense, but if we start another elasticsearch instance on the same cluster, the shards will be distributed in an even way over the cluster.
Node 1 will then hold for example only three shards:
____ ____ ____
|1||2||3||____||____||____|
Since the remaining two shards have been moved to the newly started node:
____ ____
|4||5||____||____|
Why does this happen? Because elasticsearch is a distributed search engine and this way you can make use of multiple nodes/machines to manage big amounts of data.
Every elasticsearch index is composed of at least one primary shard, since that's where the data is stored. Every shard comes at a cost though, therefore if you have a single node and no foreseeable growth, just stick with a single primary shard.
Another type of shard is replica. The default is 1, meaning that every primary shard will be copied to another shard that will contain the same data. Replicas are used to increase search performance and for fail-over. A replica shard is never going to be allocated on the same node where the related primary is (it would pretty much be like putting a backup on the same disk as the original data).
Back to our example, with 1 replica we'll have the whole index on each node, since 3 replica shards will be allocated on the first node and they will contain exactly the same data as the primaries on the second node:
____ ____ ____ ____ ____
|1||2||3||4R||5R||____||____||____||____||____|
Same for the second node, which will contain a copy of the primary shards on the first node:
____ ____ ____ ____ ____
|1R||2R||3R||4||5||____||____||____||____||____|
With a setup like this, if a node goes down you still have the whole index. The replica shards will automatically become primaries and the cluster will work properly despite the node failure.
이 내용에 흥미가 있습니까?
현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:
python에서 not, and, or의 우선순위와 상세한 용법 소개if x is false,then True,else False if x is false,then x,else y if x is false,then y,else x 1. not True = False 또는 not Fa...
텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.