Built-in analyzers & custom analyzer

1. standard analyzer

  • split text at word boundaries and removes punctuation
    • by standard tokenizer
  • lowercases letters
    • by lowercase token filter
  • contains stop token filter
    • disabled by default

2. simple analyzer

similar to standard analyzer

  • split text into tokens when encountering anything else than letters

  • lowercases letters

    • by lowercase tokenizer

3. whitespace analyzer

  • split text into tokens by whitespace
  • does not lowercase letters

4. keyword analyzer

  • no-op analyzer that leaves the text intact

    • outputs as a single token
  • used for keyword field by default

5. pattern analyzer

  • regex is used to match token separators
  • default pattern matches all non-word characters
  • lowercases letters

6. language specific analyzers

ex) english analyzer

  • standard tokenizer
  • filters : english possessive stemmer, lowercase, english stop, english keywords, english stemmer

ex) using english analyzer

PUT /products
{
    "mappings" : {
        "properties" : {
            "description" : {
                "type" : "text",
                "analyzer" : "english"
            }
        }
    }
}

7. configure built-in analyzers

ex) adding stopwords filter to standard analyzer

PUT /products
{
    "settings" : {
        "analyzer" : {
            "remove_english_stop_words" : {
                "type" : "standard",
                "stopwords" : "_english_"
            }
        }
    }
}
  • parameters supported by analyzer type can be found in analyzer reference

8. custom analyzers

PUT /analyzer_test
{
    "settings" : {
        "analysis" : {
            "filter" : {
                "danish_stop" : {
                    "type" : "stop",
                    "stopwords" : "_danish_"
                }
            },
            "char_filter" : {},
            "tokenizer" : {},
            "analyzer" : {
                "my_custom_analyzer" : {
                    "type" : "custom",
                    "char_filter" : ["html_strip"],
                    "tokenizer" : "standard",
                    "filter" : ["lowercase", "danish_stop", "asciifolding"]
                }
            }
        }
    }
}
  • analyzer must be declared with in "analyzer" object

좋은 웹페이지 즐겨찾기