[python] #12. 파이썬으로 멀티쓰레딩 간단 구현

크롤링 연습을 하다보니

보통 웹사이트는 구현해놓은 그대로 응답을 하기 마련이다. 하지만 네이버나 카카오같은 대형포털은 접속할때마다 메인 배너나 상위 뉴스, 검색 순위 등 매번 달라진 모습을 보여주게 된다.

그래서... 100번 연속으로 크롤링이 하고싶어졌다.

requests.get("https://naver.com/").content.decode("utf-8", "replace")
requests.get("https://naver.com/").content.decode("utf-8", "replace")
requests.get("https://naver.com/").content.decode("utf-8", "replace")
requests.get("https://naver.com/").content.decode("utf-8", "replace")
requests.get("https://naver.com/").content.decode("utf-8", "replace")
requests.get("https://naver.com/").content.decode("utf-8", "replace")
...
...
...

이렇게 무식하게 할 것 같지는 않았다.

그래도 비교를 위하여

네이버 메인 접속을 20번 연속으로 하면서 응답 html을 파일에 저장하는 기능이다.

from multiprocessing.dummy import Pool as ThreadPool
import requests
from datetime import datetime

def makeFile(obj) : 
    file = open( './new/'+str(obj)+'.txt', 'w' )
    file.write(requests.get("https://naver.com/").content.decode("utf-8", "replace"))
    file.close()

print(datetime.now())

# ㅌㅔ스트로  20번만...
for item in range(0, 20) :
    makeFile(item)

print(datetime.now())

<!-- 결과 : 대충 25초 정도 걸린다. -->
2021-10-15 00:44:14.677403
2021-10-15 00:44:39.557023

쓰레드 풀의 선언 및 사용

쓰레드 풀은 가급적 본인의 컴퓨터 CPU 의 쓰레드 수만큼만 세팅한다. 세팅된 쓰레드 풀만큼 돌려서 순차적으로 코어에 무리를 줘가며 작업하다가 pool.join() 으로 작업이 끝남을 알린다.

from multiprocessing.dummy import Pool as ThreadPool
import requests
from datetime import datetime

strList = []

def makeFile(obj) : 
    file = open( './new/'+obj+'.txt', 'w' )
    file.write(requests.get("https://naver.com/").content.decode("utf-8", "replace"))
    file.close()


# ㅌㅔ스트로  20번만...
for item in range(0, 20) :
    strList.append(str(item))


print(datetime.now())

pool = ThreadPool(8)
result = pool.map(makeFile, strList)
pool.close()
pool.join()

print(datetime.now())

<!-- 결과 : 네이버 20번 요청하고 파일 20개 만드는데 약 3.5초 정도 걸렸다. -->
2021-10-15 00:31:53.541769
2021-10-15 00:31:57.080269

왜 업체들이...

크롤링에 대한 고민이 끊이지 않는지 알 것 같다. 접속 차단될거 같아서 20번을 예로 들었는데 실제로 100번도 얼마 걸리지 않고 수행되었다. 웹사이트는 적당히 긁어가도록 하자. 차단당하기 싫으면

Author And Source

이 문제에 관하여([python] #12. 파이썬으로 멀티쓰레딩 간단 구현), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://velog.io/@exoluse/python-12.-파이썬으로-멀티쓰레딩-간단-구현

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다