[Python] Google 크롬 검색 기록에서 WordCloud를 생성하고 Streamlit에서 시각화하는 방법 참고

Google 크롬의 검색 기록 데이터로부터 Streamlit을 이용해, 빈출 단어의 일람표와 WordCloud를 draw 하는 방법에 대해 메모한다.

사전 준비

라이브러리 설치

  pip install pandas
  pip install streamlit
  pip install janome
  pip install matplotlib
  pip install wordcloud

pandas : 데이터 성형에 사용

streamlit : 데이터 시각화에 사용

janome : 형태소 분석에 사용

matplotlib : wordcloud 그리기에 사용

wordcloud : WordCloud 생성에 사용

Google 크롬 검색 기록 데이터History를 가져옵니다.

Windows 예제

  C:\Users\XXXXXXXX\AppData\Local\Google\Chrome\User Data\Default\History

WordCloud용 일본어 글꼴 준비

Windows의 경우: 다음 폴더의 글꼴에서 사용할 것을 선택합니다.

C:/Windows/Fonts/...

코드

wc.py

동일한 계층 구조에 History를 배치합니다.

import streamlit as st
from janome.tokenizer import Tokenizer
import collections
import pandas as pd
import sqlite3
from contextlib import closing
import matplotlib.pyplot as plt
from wordcloud import WordCloud

history = './History'

# ブラウザ履歴データを検索
def search_history():
    browser_history_text = ''

    with closing(sqlite3.connect(history)) as conn:
        c = conn.cursor()
        statements = "select title LONGVARCHAR from 'urls'"
        results= c.execute(statements)
        for result in results:
            browser_history_text += result[0]
    return browser_history_text 


# ブラウザ履歴データを形態素解析
def analyze_history(history: str):
    t = Tokenizer()

    # 頻出単語を取得
    freq_of_words = collections.Counter(token.base_form for token in t.tokenize(history)
                            if token.part_of_speech.startswith('名詞,固有名詞'))
    return freq_of_words 

# WordCloud生成
def generate_wordcloud(analyze_result: str):
    dic_result = dict(analyze_result)
    # フォントファイル名指定
    wordcloud = WordCloud(background_color='white',
                          font_path='C:/Windows/Fonts/...',
                          width=800, height=600).fit_words(dic_result)
    return wordcloud


if __name__ == "__main__":
    browser_history_text = search_history()
    freq_of_words = analyze_history(browser_history_text)
    wordcloud = generate_wordcloud(freq_of_words)
    st.title('ブラウザ検索履歴 分析')

    # Wordcloud描画時の警告を非表示にするため
    st.set_option('deprecation.showPyplotGlobalUse', False)

    # WordCloud 表示
    plt.axis("off")
    plt.tight_layout()
    plt.imshow(wordcloud, interpolation='bilinear')
    st.pyplot()

    # 頻出検索単語表 表示
    df = pd.DataFrame(freq_of_words.most_common()[:10], columns=['単語', '回数'])
    st.dataframe(df)

동작 확인

시작

streamlit run wc.py

You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8501
  Network URL: http://10.4.113.198:8501

참고 정보

How to add WordCloud graph in Streamlit

Reference

이 문제에 관하여([Python] Google 크롬 검색 기록에서 WordCloud를 생성하고 Streamlit에서 시각화하는 방법 참고), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/KWS_0901/items/17094f093a5c37970e85

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다