elasticsearch 구수편(8)분사 중국어분사ik 플러그인

7598 단어 elasticsearch
먼저 다음과 같은 표준 분사(standard)를 설정합니다.
curl -XPUT localhost:9200/local -d '{
    "settings" : {
        "analysis" : {
            "analyzer" : {
                "stem" : {
                    "tokenizer" : "standard",
                    "filter" : ["standard", "lowercase", "stop", "porter_stem"]
                }
            }
        }
    },
    "mappings" : {
        "article" : {
            "dynamic" : true,
            "properties" : {
                "title" : {
                    "type" : "string",
                    "analyzer" : "stem"
                }
            }
        }
    }
}'

index:local
type:article
default analyzer:stem(filter:소문자, 정지어 등)
field:title  
테스트:
# Sample Analysis 
curl -XGET localhost:9200/local/_analyze?analyzer=stem -d '{Fight for your life}'
curl -XGET localhost:9200/local/_analyze?analyzer=stem -d '{Bruno fights Tyson tomorrow}'
 
# Index Data
curl -XPUT localhost:9200/local/article/1 -d'{"title": "Fight for your life"}'
curl -XPUT localhost:9200/local/article/2 -d'{"title": "Fighting for your life"}'
curl -XPUT localhost:9200/local/article/3 -d'{"title": "My dad fought a dog"}'
curl -XPUT localhost:9200/local/article/4 -d'{"title": "Bruno fights Tyson tomorrow"}'
 
# search on the title field, which is stemmed on index and search
curl -XGET localhost:9200/local/_search?q=title:fight
 
# searching on _all will not do anystemming, unless also configured on the mapping to be stemmed...
curl -XGET localhost:9200/local/_search?q=fight

예를 들면 다음과 같습니다.
Fight for your life

분사는 다음과 같다.
{"tokens":[
{"token":"fight","start_offset":1,"end_offset":6,"type":"<ALPHANUM>","position":1},
{"token":"your","start_offset":11,"end_offset":15,"type":"<ALPHANUM>","position":3},
{"token":"life","start_offset":16,"end_offset":20,"type":"<ALPHANUM>","position":4} ]}

  
 
ik 분사기를 배치하려면:
1) ik분사기 플러그인(es)을./plugins/analyzerIK/중간
2)elasticsearch.yml에서 구성
index.analysis.analyzer.ik.type : "ik"
3) config에./추가config/ik
IKAnalyzer.cfg.xml
main.dic
quantifier.dic
ext.dic
stopword.dic
 
delete 이전에 생성된 index는 다음과 같이 재구성됩니다.
curl -XPUT localhost:9200/local -d '{
    "settings" : {
        "analysis" : {
            "analyzer" : {
                "ik" : {
                    "tokenizer" : "ik"
                }
            }
        }
    },
    "mappings" : {
        "article" : {
            "dynamic" : true,
            "properties" : {
                "title" : {
                    "type" : "string",
                    "analyzer" : "ik"
                }
            }
        }
    }
}'

  
테스트:
curl 'http://localhost:9200/index/_analyze?analyzer=ik&pretty=true' -d'  
{  
    "text":"         "  
}  
'  
{
  "tokens" : [ {
    "token" : "text",
    "start_offset" : 12,
    "end_offset" : 16,
    "type" : "ENGLISH",
    "position" : 1
  }, {
    "token" : "       ",
    "start_offset" : 19,
    "end_offset" : 26,
    "type" : "CN_WORD",
    "position" : 2
  }, {
    "token" : "  ",
    "start_offset" : 26,
    "end_offset" : 28,
    "type" : "CN_WORD",
    "position" : 3
  } ]
}

  
 ---------------------------------------
만약 우리가 가장 가는 입도의 분사 결과를 되돌려 주려면elasticsearch.yml의 구성은 다음과 같습니다.
index:
  analysis:
    analyzer:
      ik:
          alias: [ik_analyzer]
          type: org.elasticsearch.index.analysis.IkAnalyzerProvider
      ik_smart:
          type: ik
          use_smart: true
      ik_max_word:
          type: ik
          use_smart: false

  
테스트:
curl 'http://localhost:9200/index/_analyze?analyzer=ik_max_word&pretty=true' -d'  
{  
    "text":"         "  
}  
'  
{
  "tokens" : [ {
    "token" : "text",
    "start_offset" : 12,
    "end_offset" : 16,
    "type" : "ENGLISH",
    "position" : 1
  }, {
    "token" : "       ",
    "start_offset" : 19,
    "end_offset" : 26,
    "type" : "CN_WORD",
    "position" : 2
  }, {
    "token" : "    ",
    "start_offset" : 19,
    "end_offset" : 23,
    "type" : "CN_WORD",
    "position" : 3
  }, {
    "token" : "  ",
    "start_offset" : 19,
    "end_offset" : 21,
    "type" : "CN_WORD",
    "position" : 4
  }, {
    "token" : "  ",
    "start_offset" : 20,
    "end_offset" : 22,
    "type" : "CN_WORD",
    "position" : 5
  }, {
    "token" : "     ",
    "start_offset" : 21,
    "end_offset" : 26,
    "type" : "CN_WORD",
    "position" : 6
  }, {
    "token" : "  ",
    "start_offset" : 21,
    "end_offset" : 23,
    "type" : "CN_WORD",
    "position" : 7
  }, {
    "token" : "   ",
    "start_offset" : 23,
    "end_offset" : 26,
    "type" : "CN_WORD",
    "position" : 8
  }, {
    "token" : "  ",
    "start_offset" : 23,
    "end_offset" : 25,
    "type" : "CN_WORD",
    "position" : 9
  }, {
    "token" : " ",
    "start_offset" : 25,
    "end_offset" : 26,
    "type" : "CN_CHAR",
    "position" : 10
  }, {
    "token" : "  ",
    "start_offset" : 26,
    "end_offset" : 28,
    "type" : "CN_WORD",
    "position" : 11
  } ]
}

  
 
 

좋은 웹페이지 즐겨찾기