elasticsearch 구수편(8)분사 중국어분사ik 플러그인

7598 단어 elasticsearch

먼저 다음과 같은 표준 분사(standard)를 설정합니다.

curl -XPUT localhost:9200/local -d '{
    "settings" : {
        "analysis" : {
            "analyzer" : {
                "stem" : {
                    "tokenizer" : "standard",
                    "filter" : ["standard", "lowercase", "stop", "porter_stem"]
                }
            }
        }
    },
    "mappings" : {
        "article" : {
            "dynamic" : true,
            "properties" : {
                "title" : {
                    "type" : "string",
                    "analyzer" : "stem"
                }
            }
        }
    }
}'

index:local
type:article
default analyzer:stem(filter:소문자, 정지어 등)
field:title　　
테스트:

# Sample Analysis 
curl -XGET localhost:9200/local/_analyze?analyzer=stem -d '{Fight for your life}'
curl -XGET localhost:9200/local/_analyze?analyzer=stem -d '{Bruno fights Tyson tomorrow}'
 
# Index Data
curl -XPUT localhost:9200/local/article/1 -d'{"title": "Fight for your life"}'
curl -XPUT localhost:9200/local/article/2 -d'{"title": "Fighting for your life"}'
curl -XPUT localhost:9200/local/article/3 -d'{"title": "My dad fought a dog"}'
curl -XPUT localhost:9200/local/article/4 -d'{"title": "Bruno fights Tyson tomorrow"}'
 
# search on the title field, which is stemmed on index and search
curl -XGET localhost:9200/local/_search?q=title:fight
 
# searching on _all will not do anystemming, unless also configured on the mapping to be stemmed...
curl -XGET localhost:9200/local/_search?q=fight

예를 들면 다음과 같습니다.

Fight for your life

분사는 다음과 같다.

{"tokens":[
{"token":"fight","start_offset":1,"end_offset":6,"type":"<ALPHANUM>","position":1},
{"token":"your","start_offset":11,"end_offset":15,"type":"<ALPHANUM>","position":3},
{"token":"life","start_offset":16,"end_offset":20,"type":"<ALPHANUM>","position":4}
]}

　　

ik 분사기를 배치하려면:
1) ik분사기 플러그인(es)을./plugins/analyzerIK/중간
2)elasticsearch.yml에서 구성
index.analysis.analyzer.ik.type : "ik"
3) config에./추가config/ik
IKAnalyzer.cfg.xml
main.dic
quantifier.dic
ext.dic
stopword.dic

delete 이전에 생성된 index는 다음과 같이 재구성됩니다.

curl -XPUT localhost:9200/local -d '{
    "settings" : {
        "analysis" : {
            "analyzer" : {
                "ik" : {
                    "tokenizer" : "ik"
                }
            }
        }
    },
    "mappings" : {
        "article" : {
            "dynamic" : true,
            "properties" : {
                "title" : {
                    "type" : "string",
                    "analyzer" : "ik"
                }
            }
        }
    }
}'

　　
테스트:

curl 'http://localhost:9200/index/_analyze?analyzer=ik&pretty=true' -d'  
{  
    "text":"         "  
}  
'  
{
  "tokens" : [ {
    "token" : "text",
    "start_offset" : 12,
    "end_offset" : 16,
    "type" : "ENGLISH",
    "position" : 1
  }, {
    "token" : "       ",
    "start_offset" : 19,
    "end_offset" : 26,
    "type" : "CN_WORD",
    "position" : 2
  }, {
    "token" : "  ",
    "start_offset" : 26,
    "end_offset" : 28,
    "type" : "CN_WORD",
    "position" : 3
  } ]
}

　　
---------------------------------------
만약 우리가 가장 가는 입도의 분사 결과를 되돌려 주려면elasticsearch.yml의 구성은 다음과 같습니다.

index:
  analysis:
    analyzer:
      ik:
          alias: [ik_analyzer]
          type: org.elasticsearch.index.analysis.IkAnalyzerProvider
      ik_smart:
          type: ik
          use_smart: true
      ik_max_word:
          type: ik
          use_smart: false

　　
테스트:

curl 'http://localhost:9200/index/_analyze?analyzer=ik_max_word&pretty=true' -d'  
{  
    "text":"         "  
}  
'  
{
  "tokens" : [ {
    "token" : "text",
    "start_offset" : 12,
    "end_offset" : 16,
    "type" : "ENGLISH",
    "position" : 1
  }, {
    "token" : "       ",
    "start_offset" : 19,
    "end_offset" : 26,
    "type" : "CN_WORD",
    "position" : 2
  }, {
    "token" : "    ",
    "start_offset" : 19,
    "end_offset" : 23,
    "type" : "CN_WORD",
    "position" : 3
  }, {
    "token" : "  ",
    "start_offset" : 19,
    "end_offset" : 21,
    "type" : "CN_WORD",
    "position" : 4
  }, {
    "token" : "  ",
    "start_offset" : 20,
    "end_offset" : 22,
    "type" : "CN_WORD",
    "position" : 5
  }, {
    "token" : "     ",
    "start_offset" : 21,
    "end_offset" : 26,
    "type" : "CN_WORD",
    "position" : 6
  }, {
    "token" : "  ",
    "start_offset" : 21,
    "end_offset" : 23,
    "type" : "CN_WORD",
    "position" : 7
  }, {
    "token" : "   ",
    "start_offset" : 23,
    "end_offset" : 26,
    "type" : "CN_WORD",
    "position" : 8
  }, {
    "token" : "  ",
    "start_offset" : 23,
    "end_offset" : 25,
    "type" : "CN_WORD",
    "position" : 9
  }, {
    "token" : " ",
    "start_offset" : 25,
    "end_offset" : 26,
    "type" : "CN_CHAR",
    "position" : 10
  }, {
    "token" : "  ",
    "start_offset" : 26,
    "end_offset" : 28,
    "type" : "CN_WORD",
    "position" : 11
  } ]
}

이 내용에 흥미가 있습니까?

현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:

kafka connect e elasticsearch를 관찰할 수 있습니다.

No menu lateral do dashboard tem a opção de connectors onde ele mostra todos os clusters do kafka connect conectados atu...

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.

고급JAVA 13강 - 네트워크

오픈월드 계열의 게임 개발을 겪었기 때문에 #1테라인과 디테일 편을 많이 말했어요

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다