[문자열 조작] 가장 흔한 단어

리스트 컴프리헨션, Counter 객체 사용

from typing import List
import re
import collections


def most_common_word(paragraph: str, banned: List[str]) -> str:
    # 1. Replace non-word characters with spaces
    words = [
        word
        for word in re.sub(r"[^\w]", " ", paragraph).lower().split()
        if word not in banned
    ]
    # print(words)

    # 2. Create Counter Object
    counts = collections.Counter(words)
    # print(counts)

    # 3. Return most common word using most_common function
    return counts.most_common(1)[0][0]


if __name__ == "__main__":
    paragraph = "Bob hit a ball, the hit BALL flew far after it was hit."
    banned = ["hit"]

    print(most_common_word(paragraph, banned))

입력값에 대소문자, 특수문자가 섞여 있어 데이터 클렌징Data Cleansing을 해줘야 한다.
정규식을 사용했는데 \W는 단어 문자를 뜻하며 ^는 not을 의미한다. 따라서 위 코드는 단어 문자가 아닌 모든 문자를 공백으로 치환한다.
Counter 객체와 most_common(1)으로 가장 흔하게 등장하는 단어의 첫 번째 값을 추출해준다.

Counter 객체

counts = collections.Counter(words)

return counts.most_common(1)[0][0]

위 과정을 단계별로 표현하면 다음과 같다.

# counts 객체
Counter({'ball': 2, 'bob': 1, 'a': 1, 'the': 1, 'flew': 1, 'far': 1, 'after': 1, 'it': 1, 'was': 1})

# most_common(1)
[('ball', 2)]

# most_common(1)[0]
('ball', 2)

# most_common(1)[0][0]
ball

즉, Counter 객체를 벗겨내고 가장 흔하게 등장한 단어만 추출하기 위해 most_common(1)[0][0]를 리턴한 것이다.

참고 자료

파이썬 알고리즘 인터뷰

Author And Source

이 문제에 관하여([문자열 조작] 가장 흔한 단어), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://velog.io/@t1won/문자열-조작-가장-흔한-단어

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다