[콩잎] 댓글을 기어오르다(영화평론, 서평)

4401 단어 python 학습

0x00 선언


56행 코드를 참고하여 당신을 데리고 콩판 영화 평론(영화 평론의 단평판)을 기어오르게 하고 코드를 수정하여 영화 평론의 장평을 받고 서평의 단평, 장평을 다시 갱신합니다.

0x01 영화 평론


1.단평


코드:
import requests
from urllib.parse import urlencode
import re
import csv
import time


#         
def get_one(num):
    headers = {
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3724.8 Safari/537.36'

    }
    params = {
        'start': str(num),
        'limit': '20',
        'sort': 'new_score',
        'status': 'P',
        'percent_type': ''
    }
    base_url = 'https://movie.douban.com/subject/20444530/comments?'#          20444530    
    url = base_url + urlencode(params)
    print("    :" + url)
    try:
        response = requests.get(url, headers=headers, timeout=10)
        if response.status_code == 200:
            return response.text
    except EOFError as e:
        print(e)
        return None


#       
def parse_page(html):
    info = []
    patten1 = re.compile(
        r'
.*?(.*?).*?.*?.*?

(.*?)

.*?
', re.S) datas = re.findall(patten1, html) print(datas) for data in datas: comic = {} comic['User'] = data[0].strip() comic['Time'] = data[1].strip() comic['Comment'] = data[2].strip().split() info.append(comic) return info # def write_to_file(info): with open('《 》 .csv', 'a', newline='') as f: fieldnames = ['User', 'Time', 'Comment'] writer = csv.DictWriter(f, fieldnames=fieldnames) writer.writeheader() try: writer.writerows(info) except: pass # def main(): for i in range(10): html = get_one(i * 20) datas = parse_page(html) write_to_file(datas) print(' 。') # time.sleep(1) # if __name__ == '__main__': main()

2.장평


코드:
이따가

0x02 서평


1.단평


코드:
import requests
from urllib.parse import urlencode
import re
import csv
import time


#         
def get_one(num):
    headers = {
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3724.8 Safari/537.36'

    }
    params = {
        'p': str(num)
    }
    base_url = 'https://book.douban.com/subject/24838578/comments/hot?' #       ,  24838578   
    url = base_url + urlencode(params)
    print("    :" + url)
    try:
        response = requests.get(url, headers=headers, timeout=10)
        if response.status_code == 200:
            return response.text
    except EOFError as e:
        print(e)
        return None


#       
def parse_page(html):
    info = []
    patten1 = re.compile(
        r'
.*?.*?(.*?).*?.*?

(.*?)

.*?
', re.S) datas = re.findall(patten1, html) #print(datas) for data in datas: comic = {} comic['User'] = data[0].strip() # comic['Time'] = data[1].strip() comic['Comment'] = data[1].strip().split() #print(comic) info.append(comic) return info # def write_to_file(info): with open('《 》 .csv', 'a', newline='') as f: fieldnames = ['User', 'Comment'] writer = csv.DictWriter(f, fieldnames=fieldnames) writer.writeheader() try: writer.writerows(info) except: pass # def main(): for i in range(0,100): html = get_one(i) datas = parse_page(html) write_to_file(datas) print(' 。') # #time.sleep(1) # if __name__ == '__main__': main()

2.장평


코드:
이따가

좋은 웹페이지 즐겨찾기