Built-in analyzers & custom analyzer
2771 단어 elasticsearchelasticsearch
1. standard analyzer
- split text at word boundaries and removes punctuation
- by standard tokenizer
- lowercases letters
- by lowercase token filter
- contains stop token filter
- disabled by default
2. simple analyzer
similar to standard analyzer
-
split text into tokens when encountering anything else than letters
-
lowercases letters
- by lowercase tokenizer
3. whitespace analyzer
- split text into tokens by whitespace
- does not lowercase letters
4. keyword analyzer
-
no-op analyzer that leaves the text intact
- outputs as a single token
-
used for keyword field by default
5. pattern analyzer
- regex is used to match token separators
- default pattern matches all non-word characters
- lowercases letters
6. language specific analyzers
ex) english analyzer
- standard tokenizer
- filters : english possessive stemmer, lowercase, english stop, english keywords, english stemmer
ex) using english analyzer
PUT /products
{
"mappings" : {
"properties" : {
"description" : {
"type" : "text",
"analyzer" : "english"
}
}
}
}
7. configure built-in analyzers
ex) adding stopwords filter to standard analyzer
PUT /products
{
"settings" : {
"analyzer" : {
"remove_english_stop_words" : {
"type" : "standard",
"stopwords" : "_english_"
}
}
}
}
- parameters supported by analyzer type can be found in analyzer reference
8. custom analyzers
PUT /analyzer_test
{
"settings" : {
"analysis" : {
"filter" : {
"danish_stop" : {
"type" : "stop",
"stopwords" : "_danish_"
}
},
"char_filter" : {},
"tokenizer" : {},
"analyzer" : {
"my_custom_analyzer" : {
"type" : "custom",
"char_filter" : ["html_strip"],
"tokenizer" : "standard",
"filter" : ["lowercase", "danish_stop", "asciifolding"]
}
}
}
}
}
- analyzer must be declared with in "analyzer" object
Author And Source
이 문제에 관하여(Built-in analyzers & custom analyzer), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://velog.io/@sangmin7648/Built-in-analyzers-custom-analyzer저자 귀속: 원작자 정보가 원작자 URL에 포함되어 있으며 저작권은 원작자 소유입니다.
우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)