영화 천국 의 영화 정 보 를 얻다.

6427 단어

오늘 은 아침 내 내 파충류 로 영화 천국 으로 기어 가 는 영화 링크 를 만 들 었 다.정규 표현 식 을 사 용 했 습 니 다.전반적 으로 말 하면 괜찮다.코드 를 올리다.이미 실 현 된 기능:
1. 영화 가 발 표 된 날짜 2. 영화 의 이름 3. 영화 의 시대 4. 영화 의 산지 5. 영화 의 종류 6: 영화 의 자막
이렇게 많은 필드 가 필요 할 지 고민 중이 다.콩잎 을 잡 아 점 수 를 매 기 려 는 문제 도 있다.하지만 영화 천국 의 홈 페이지 구성 은 조금 다르다.두 사람 이 쓴 게 분명 해.어색 하 다
원 하 는 목 표를 달성 하려 면 영화 이름, 카 테 고리, 포스터, 다운로드 주 소 를 캡 처 합 니 다.이것들 은 기본적으로 모두 완성 되 었 다.이전 코드:

from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup
from time import sleep
import re
moviesLinks = set()        #    ，         

def getLinks(pageUrl):
    global moviesLinks     
    html = urlopen(pageUrl)
    bs4 = BeautifulSoup(html,"xml")
    #print(bs4.prettify())

    for link in bs4.findAll("a",{"href":re.compile("/html/gndy/+[a-z]+/[0-9]+/[0-9]+\.html")}):  #           (         ，    ，    )
        if link.attrs['href'] not in moviesLinks:
            newLink = link.attrs['href']
            print(newLink)
            moviesLinks.add(newLink)
            getPageImformation(newLink)


def getPageImformation(pageUrl):
    url = 'http://www.dytt8.net/'+pageUrl
    html = urlopen(url)
    bs4 = BeautifulSoup(html,"xml")
    try:
        date = bs4.find("div",{"class":"co_content8"}).ul.get_text().strip()  #  date        
        date = date.split('
')[0]  #         
    except AttributeError:
        print("                ，    ")
    try:
        poster  =  bs4.find("div",{"id":"Zoom"}).img.attrs['src']
        print(poster)
    except AttributeError:
        print('           ，    ')
    #              （        six  six）
    try:
        name = bs4.find("div",{"id":"Zoom"}).p.get_text().split('◎')[1][4:].strip()  #     
    except Exception:
        print("                ,    ")
    try:
        time = bs4.find("div",{"id":"Zoom"}).p.get_text().split('◎')[3][4:].strip()

    except  Exception:
        print("                ,    ")
    try:
        origin = bs4.find("div",{"id":"Zoom"}).p.get_text().split('◎')[4][4:].strip()
        print(origin)
    except Exception:
        print("           ,    ")
    try:
        category = bs4.find("div",{"id":"Zoom"}).p.get_text().split('◎')[5][4:].strip()
        print(category)
    except Exception:
        print("           ,    ")
    print("--------------------------------
")
    try:
        downloadLink = bs4.find("td",{"bgcolor":"#fdfddf"}).a.get_text()
        print(downloadLink)
    except Exception:
        print("               ,    ")

getLinks('http://www.dytt8.net/')


#  3   
#  4   
#  5   
#  6   
#  7   
#  8IMDb  
#

이 내용에 흥미가 있습니까?

현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:

다양한 언어의 JSON

JSON은 Javascript 표기법을 사용하여 데이터 구조를 레이아웃하는 데이터 형식입니다. 그러나 Javascript가 코드에서 이러한 구조를 나타낼 수 있는 유일한 언어는 아닙니다. 저는 일반적으로 '객체'{}...

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다