파충류 가 만난 문제 총화

1.요청 한 HTML 중국어 인 코딩 문제
import requests
from bs4 import BeautifulSoup
newsurl = "http://news.sina.com.cn/china"
res = requests.get(newsurl)
soup = BeautifulSoup(res.text,"lxml")
news_item = soup.select(".news-item")
print(news_item[0].select("h2")[0].text)

결과:
����������� �止�

해결 방법
import requests
from bs4 import BeautifulSoup
newsurl = "http://news.sina.com.cn/china"
res = requests.get(newsurl)
soup = BeautifulSoup(res.text.encode(res.encoding).decode('utf-8'),"lxml") #     
news_item = soup.select(".news-item")
print(news_item[0].select("h2")[0].text)

결과:

2.파충류 장시간 운행 오류
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))

해결 방법 1.요청 헤더 user-agent 설정:
headers = requests.utils.default_headers()
headers['User-Agent'] = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36'
#headers['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.167 Safari/537.36'
r = requests.get('https://academic.oup.com/journals', headers=headers)

해결 방법 2:ip 주소 변경

좋은 웹페이지 즐겨찾기