python은 BeautifulSoup을 사용하여 웹 페이지 정보를 분석하는 방법

1381 단어

본고의 실례는python이BeautifulSoup을 사용하여 웹 정보를 분석하는 방법을 설명하였다.여러분에게 참고하도록 공유하다.구체적으로 다음과 같다.
이python 코드는 웹 페이지의 모든 링크를 찾고, 모든span 탭을 분석하며,class가 titletext를 포함하는span의 내용을 찾습니다


   #import the library used to query a website 
  
 import urllib2 
   
  #specify the url you want to query
 url = "http://www.python.org" 
  #Query the website and return the html to the variable 'page'
 page = urllib2.urlopen(url) 
  #import the Beautiful soup functions to parse the data returned from the website
 from BeautifulSoup import BeautifulSoup 
  #Parse the html in the 'page' variable, and store it in Beautiful Soup format
 soup = BeautifulSoup(page) 
  #to print the soup.head is the head tag and soup.head.title is the title tag
 print soup.head
 print soup.head.title 
  #to print the length of the page, use the len function
 print len(page) 
  #create a new variable to store the data you want to find.
 tags = soup.findAll('a') 
  #to print all the links
 print tags 
  #to get all titles and print the contents of each title
 titles = soup.findAll('span', attrs = { 'class' : 'titletext' })
 for title in allTitles:
 print title.contents

본고에서 서술한 것이 여러분의 파이톤 프로그램 설계에 도움이 되었으면 합니다.

이 내용에 흥미가 있습니까?

현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:

다양한 언어의 JSON

JSON은 Javascript 표기법을 사용하여 데이터 구조를 레이아웃하는 데이터 형식입니다. 그러나 Javascript가 코드에서 이러한 구조를 나타낼 수 있는 유일한 언어는 아닙니다. 저는 일반적으로 '객체'{}...

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다