macOS Sierra에서 Elasticsearch를 사용하여 형태소 분석 환경을 손쉽게 만드는 단계

15362 단어 MacOSX 파이썬 형태소 분석 Elasticsearch

준비

앞에서 된장으로 죄송합니다만, 나중에 필요하므로 아래의 페이지에서 macOS에 Jupyter Notebook을 인스톨 해 두어 주세요.
ぃ tp // 이 m/미 x_dvd/이고 ms/d915752215db67919c06

JAVA 확인 및 설치

설치되었는지 확인하기 위해 다음 명령을 실행합니다.

$ java -version

설치되어 있지 않은 경우는, 아래의 다이얼로그가 표시되므로 「자세한 정보...」라고 하는 버튼을 클릭

위의 웹 사이트가 표시되므로 JDK를 다운로드하여 설치

설치 후 명령을 다시 실행하여 설치되었는지 확인

$ java -version
java version "1.8.0_101"
Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)

Elasticsearch 설치

[공식 사이트] htps //w w. 에스 c. 코 / jp / p 로즈 cts / 에 s 치 c 세아 rch

프로그램 설치

다음 명령 실행

$ curl -O https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/zip/elasticsearch/2.3.4/elasticsearch-2.3.4.zip
$ unzip elasticsearch-2.3.4.zip
$ sudo mv elasticsearch-2.3.4 /usr/local/elasticsearch

버전 확인

$ /usr/local/elasticsearch/bin/elasticsearch --version
Version: 2.3.4, Build: e455fd0/2016-06-30T11:24:31Z, JVM: 1.8.0_101

플러그인 설치

다음 명령 실행

$ cd /usr/local/elasticsearch
$ bin/plugin install analysis-kuromoji

시작

다음 명령 실행

$ /usr/local/elasticsearch/bin/elasticsearch

동작 확인

다른 터미널을 시작하고 다음 명령을 실행

$ curl localhost:9200

또는 웹 브라우저에서 다음 URL에 액세스

다음과 같은 응답이 있으면 부팅 성공

{
  "name" : "Akasha",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "2.3.4",
    "build_hash" : "Xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
    "build_timestamp" : "2016-06-30T11:24:31Z",
    "build_snapshot" : false,
    "lucene_version" : "5.5.0"
  },
  "tagline" : "You Know, for Search"
}

Python용 라이브러리 설치

다음 명령 실행

$ pip install elasticsearch

샘플 코드 실행

아래 코드를 test.py로 저장

test.py


# coding: utf-8

# # Elasticsearch

# In[1]:

from elasticsearch import Elasticsearch
es = Elasticsearch("localhost:9200")
es


# # 変数の初期化

# In[2]:

esIndex = "bot"
esType = "talks"


# # インデックスの追加

# - curl -X POST http://localhost:9200/bot/talks -d '{"mode":"あいさつ", "words":"おはようございます"}'

# In[3]:

es.index(index=esIndex, doc_type=esType, body={"mode":"あいさつ", "words":"おはようございます"})


# In[4]:

es.index(index=esIndex, doc_type=esType, body={"mode":"あいさつ", "words":"こんにちは"})
es.index(index=esIndex, doc_type=esType, body={"mode":"あいさつ", "words":"こんばんは"})
es.index(index=esIndex, doc_type=esType, body={"mode":"あいさつ", "words":"さようなら"})
es.index(index=esIndex, doc_type=esType, body={"mode":"あいさつ", "words":"おやすみなさい"})
es.index(index=esIndex, doc_type=esType, body={"mode":"名言", "words":"死して屍拾うものなし"})


# # インデックスの修正

# - curl -X POST http://localhost:9200/bot/talks?id=AVYGQm6Q8mtRod8eIWiq -d '{"mode":"あいさつ","words":"お休みなさい"}'
# 
# idが存在すれば更新、idが存在しなければ追加

# In[21]:

es.index(index=esIndex, doc_type=esType, id="AVYGQm6Q8mtRod8eIWiq", body={"mode":"あいさつ", "words":"また明日"})


# # データ取得

# - curl -X　GET http://localhost:9200/bot/talks/_search?pretty -d '{"query":{"match_all":{}}}'

# In[29]:

res = es.search(index=esIndex, body={"query": {"match_all": {}}})
res


# In[23]:

len(res["hits"]["hits"])

words = []
modes = []

for i in range(len(res["hits"]["hits"])):
    row = res["hits"]["hits"][i]["_source"]
    print(row)
    words.append(row["words"])
    modes.append(row["mode"])


# # データ削除

# - curl -X DELETE http://localhost:9200/bot/

# In[8]:

#es.indices.delete(index="bot")


# # プラグインの利用

# - 形態素解析

# In[24]:

text = "今日はいい天気ですね"


# In[25]:

def analyze(es, text):

    params = {"analyzer":"kuromoji"}
    body = {"text":text}

    http_status, data = es.indices.client.transport.perform_request(
        'GET',
        '/' + esIndex + '/_analyze',
        params=params,
        body=body
    )

    return map(lambda x: x.get('token'), data.get('tokens')[0:])


# In[26]:

tokens = analyze(es, text)
print(' '.join(tokens))


# In[30]:

for word in words:
    print(' '.join(analyze(es, word)))

다음 명령 실행

$ python test.py

아래와 같이 응답이 있으면 성공!

{'mode': 'あいさつ', 'words': 'おはようございます'}
{'mode': 'あいさつ', 'words': 'こんばんは'}
{'mode': 'あいさつ', 'words': 'こんにちは'}
{'mode': 'あいさつ', 'words': 'さようなら'}
{'mode': 'あいさつ', 'words': 'おやすみなさい'}
{'mode': '名言', 'words': '死して屍拾うものなし'}
{'mode': 'あいさつ', 'words': 'また明日'}
今日 いい 天気
おはよう
こんばんは
こんにちは
さようなら
おやすみなさい
死す 屍 拾う
明日

그런데, 앞으로 무엇을 할까(^_^;)

추가

아, Jupyter Notebook 사용하지 않았습니다 (땀)

Reference

이 문제에 관하여(macOS Sierra에서 Elasticsearch를 사용하여 형태소 분석 환경을 손쉽게 만드는 단계), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/mix_dvd/items/2f604e6d1897ea112097

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

루비로 라임을 만드는 이야기 (랩, 운율)

【R】【MeCab】RMeCab의 인스톨과 형태소 해석

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다