attention 바람의 시각화

오리지널을 참고로, 저쪽에서 다소, 함수에 수정을 더하고 있습니다.


def highlight(word, attn):
    "Attentionの値が大きいと文字の背景が濃い赤になるhtmlを出力させる関数"

    html_color = '#%02X%02X%02X' % (
        255, int(255*(1 - attn)), int(255*(1 - attn)))
    return '<span style="background-color: {}"> {}</span>'.format(html_color, word)

def mk_html(words, attns):
    html = ""
    for word, attn in zip(words, attns):
        html += ' ' + highlight(word,attn)
    return html + "<br><br>"

mk_html 함수에 단어 열과 점수 열을 입력하면 attention 스타일의 시각화 결과가 출력됩니다.
이번에는 샘플 데이터를 규칙 기반으로 만듭니다.


from janome.tokenizer import Tokenizer

t = Tokenizer()

# 対象文章
sentence = "すもももももももものうち"

# ルールベースでのスコア計算
import random 

words = []
attns = []

for token in t.tokenize(sentence):
    words.append(token.surface)
    if token.part_of_speech.startswith('名詞'):
        attns.append(0.6 + 0.2 * random.random())  # 0～1。1に近いほど赤くなる
    else:
        attns.append(0.2 + 0.2 * random.random())  # 0～1。1に近いほど赤くなる

print(words)
print(attns)

산출:
['스모모', '모', '모모', '모', '모모', '의', '우리']
[0.7397653917795854, 0.38089028798835384, 0.7423692670794249, 0.27461082771113776, 0.614260883381291

이 결과를 이용해, attention풍에, 텍스트 해석 결과를 가시화합니다.

from IPython.display import HTML
display(HTML(mk_html(words, attns)))

산출:

이런 느낌의 결과가 출력됩니다.

텍스트의 시험 해석 결과의 가시화에서는, WordCloud등을 이용하고 있었습니다만,
attention풍의 가시화도, 주목할 단어가 알기 쉽고, 개인적으로는 밀어줍니다.

통상, transformer나, BERT등에서 이용되는 가시화 표현입니다만, 단어열과 스코어열마저 작성할 수 있으면 적용할 수 있으므로, 꽤 응용 범위는 넓다고 느꼈습니다.

여러분도 꼭 시험해 보세요.

Reference

이 문제에 관하여(텍스트 마이닝에서 attention 스타일의 시각화는 더 평가되어야한다.), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/pocket_kyoto/items/63cf2ecfec84f329c8a9

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다