python 파충류 beautifulsoup 분석 html 방법

BeautifulSoup 으로 html 와 xml 문자열 을 분석 합 니 다.
对象参数说明

실례:


#!/usr/bin/python
# -*- coding: UTF-8 -*-
from bs4 import BeautifulSoup
import re

#      
html_doc = """
<html>
<head>
  <title>The Dormouse's story</title>
</head>
<body>
<p class="title aq">
  <b>
    The Dormouse's story
  </b>
</p>

<p class="story">Once upon a time there were three little sisters; and their names were
  <a href="http://example.com/elsie" rel="external nofollow" class="sister" id="link1">Elsie</a>,
  <a href="http://example.com/lacie" rel="external nofollow" class="sister" id="link2">Lacie</a> 
  and
  <a href="http://example.com/tillie" rel="external nofollow" class="sister" id="link3">Tillie</a>;
  and they lived at the bottom of a well.
</p>

<p class="story">...</p>
"""


# html     BeautifulSoup  
soup = BeautifulSoup(html_doc, 'html.parser', from_encoding='utf-8')

#      title   
print soup.title

#      title        
print soup.title.name

#      title        
print soup.title.string

#      title            
print soup.title.parent.name

#      p   
print soup.p

#      p     class     
print soup.p['class']

#      a     href     
print soup.a['href']
'''
soup        ,     .     , soup            
'''
#      a    href    http://www.baidu.com/
soup.a['href'] = 'http://www.baidu.com/'

#     a      name   
soup.a['name'] = u'  '

#      a     class    
del soup.a['class']

##      p         
print soup.p.contents

#      a   
print soup.a

#      a   ，       
print soup.find_all('a')

#      id      link3   a   
print soup.find(id="link3")

#        
print(soup.get_text())

#      a          
print soup.a.attrs


for link in soup.find_all('a'):
  #   link   href     
  print(link.get('href'))

# soup.p            
for child in soup.p.children:
  print(child)

#    ，     b   
for tag in soup.find_all(re.compile("b")):
  print(tag.name)

파충류 디자인 아이디어:
爬虫设计思路

자세 한 매 뉴 얼:
https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/
python 파충류 beautifulsoup 해석 html 방법 에 관 한 이 글 은 여기까지 소개 되 었 습 니 다.더 많은 beautifulsoup 해석 html 내용 은 우리 의 이전 글 을 검색 하거나 아래 의 관련 글 을 계속 찾 아 보 세 요.앞으로 도 많은 응원 부 탁 드 리 겠 습 니 다!

이 내용에 흥미가 있습니까?

현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:

Python을 사용한 웹 스크레이퍼(아름다운 수프) 및 Heroku에 배포 [1부]

얼마 전에 나는 (HTML 및 XML 파일에서 데이터를 추출하기 위한 Python 라이브러리)를 사용하여 웹 크롤링 프로젝트를 만들기로 결정했습니다. 우리는 개발 전반에 걸쳐 , Windows에 설치하는 방법에 대한...

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다