[Python Snippets] 글 요약 추출 라 이브 러 리

Python 글 요약 추출 라 이브 러 리
예제 텍스트
본문 을 content. txt 에 저장 합 니 다.
1. Textrank4zh
http://news.steelcn.cn/a/105/...
설치 하 다.

$ pip install textrank4zh

예시

import codecs
from textrank4zh import TextRank4Keyword, TextRank4Sentence


content = codecs.open('content.txt', 'r', 'utf-8').read()
tr4s = TextRank4Sentence()
tr4s.analyze(text=content, lower=True, source='all_filters')

for item in tr4s.get_key_sentences(num=3):
    print(item.index, item.weight, item.sentence)
    

    
# Result:
# 0 0.11783211562891267     ，                    (ITmk3)     ，     (Steel Dynamics)        Hoyt Lakes 
# 6 0.09533764028919228                 ，   2010      50          
# 1 0.08828227247879757

2. FastTextRank
https://github.com/letiantian...
설치 하 다.

$ pip install

예시

import codecs
from FastTextRank.FastTextRank4Sentence import FastTextRank4Sentence


mod = FastTextRank4Sentence(use_w2v=False, tol=0.0001)

sentence_number = 1
content = codecs.open('content.txt', 'r', 'utf-8').read()
print(mod.summarize(content, sentence_number))


# Result:
# ['    ，                    (ITmk3)     ，     (Steel Dynamics)        Hoyt Lakes           。']

3. Sumy
https://github.com/ArtistScri...
설치 하 다.

$ pip install sumy

예시

from __future__ import absolute_import
from __future__ import division, print_function, unicode_literals

from sumy.parsers.html import HtmlParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.parsers.plaintext import PlaintextParser
from sumy.summarizers.lsa import LsaSummarizer as Summarizer
from sumy.nlp.stemmers import Stemmer
from sumy.utils import get_stop_words


LANGUAGE = "chinese"
SENTENCES_COUNT = 1


if __name__ == "__main__":
    url = "http://news.steelcn.cn/a/105/20100123/103370A9F83806.html"
    parser = HtmlParser.from_url(url, Tokenizer(LANGUAGE))
    # or for plain text files
    # parser = PlaintextParser.from_file("content.txt", Tokenizer(LANGUAGE))
    # parser = PlaintextParser.from_string("Check this out.", Tokenizer(LANGUAGE))
    stemmer = Stemmer(LANGUAGE)

    summarizer = Summarizer(stemmer)
    summarizer.stop_words = get_stop_words(LANGUAGE)

    for sentence in summarizer(parser.document, SENTENCES_COUNT):
        print(sentence)


# Result:
#        ，        、  、   、            ，           。

4. Gensim
https://github.com/miso-belic...
설치 하 다.

$ pip install gensim

예시

import codecs
from gensim.summarization.summarizer import summarize



content = codecs.open('content.txt', 'r', 'utf-8').read()

summary = summarize(content, ratio=0.2)
print(summary)


# Result:
#     ,    gensim

5. SnowNLP
https://github.com/RaRe-Techn...
설치 하 다.

$ pip install snownlp

예시

from snownlp import SnowNLP
import codecs

content = codecs.open('content.txt', 'r', 'utf-8').read()
s = SnowNLP(content)
print(s.keywords(3))
print(s.summary(3))

# Result:
# ['  ', ' ', '  ']
# ['          ', '             ', '     (Steel Dynamics)        Hoyt Lakes ']

6. Textteaser
https://github.com/isnowfy/sn...
지금 영어 만 지원 하 는 것 같 아 요.

import codecs


content = codecs.open('content.txt', 'r', 'utf-8').read()
title = ""

tt = TextTeaser(content)
summary = tt.summarize(title, text)
print(summary)

총결산
이상 은 추출 형 요약 입 니 다. 모두 한 번 해 보 았 는데 Textrank4zh 와 FastTextRank 효과 가 괜 찮 은 것 같 습 니 다. 그 다음은 Sumy 입 니 다.추 후 추상 적 요약 에 관 한 라 이브 러 리 도 추가 할 것 이다.

이 내용에 흥미가 있습니까?

현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:

로마 숫자를 정수로 또는 그 반대로 변환

그 중 하나는 로마 숫자를 정수로 변환하는 함수를 만드는 것이었고 두 번째는 그 반대를 수행하는 함수를 만드는 것이었습니다. 문자만 포함합니다'I', 'V', 'X', 'L', 'C', 'D', 'M' ; 문자열이 ...

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다