beautifulsoup4 튜토리얼 (3) 문서 트리 훑어보기 및 검색
4. 문서 트리를 옮겨다니기
4.1 직접 서브노드
.contents
tag
대상의 .contents
속성은 특정한 tag의 하위 노드를 목록으로 출력할 수 있다. 물론 목록은 색인으로 목록의 요소를 가져올 수 있다.#-*-coding:utf-8-*-
from bs4 import BeautifulSoup
html = """
The Dormouse's story
The Dormouse's story
Once upon a time there were three little sisters; and their names were
,
Lacie and
Tillie;
and they lived at the bottom of a well.
...
"""
# beautifulsoup
# html beautifulsoup , :
#soup = BeautifulSoup(open('index.html'))
soup = BeautifulSoup(html,features="lxml")
print soup.body.contents
print soup.body.contents[1]
result:
[u'
', <p class="title" name="dromouse"><b>The Dormouse's story, u'
', Once upon a time there were three little sisters; and their names were
,
Lacie and
Tillie;
and they lived at the bottom of a well.
, u'
', ...
, u'
']
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
.children
print soup.head.children
for child in soup.body.children:
print child
<listiterator object at 0x00000000039C3080>
result:
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
4.2 모든 자손의 결점
.descendants
속성print soup.head.descendants
for child in soup.body.descendants:
print child
result:
<generator object descendants at 0x0000000003970E58>
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
</body></html>
<head><title>The Dormouse's story</title></head>
<title>The Dormouse's story</title>
The Dormouse's story
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
</body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<b>The Dormouse's story</b>
The Dormouse's story
<p class="story">Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>
Elsie
,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>
Lacie
and
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
Tillie
;
and they lived at the bottom of a well.
<p class="story">...</p>
...
4.3 결점 내용
print soup.title
print soup.title.string
result:
The Dormouse's story
The Dormouse's story
print soup.head
print soup.head.string
result:
The Dormouse's story
The Dormouse's story
print soup.body
print soup.body.string
result:
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
</body>
None
.strings
속성이나 .stripped_strings
를 사용해야 한다. 그들이 얻은 것은 모두 생성기이다.print soup.strings
for string in soup.strings:
print string
result:
<generator object _all_strings at 0x0000000003170E58>
The Dormouse's story
The Dormouse's story
Once upon a time there were three little sisters; and their names were
,
Lacie
and
Tillie
;
and they lived at the bottom of a well.
...
Tag 객체의
.stripped_strings
속성을 사용하여 빈 행을 제거한 태그 내의 여러 컨텐트를 가져옵니다.print soup.stripped_strings
for string in soup.stripped_strings:
print string
result:
<generator object stripped_strings at 0x00000000030D0E58>
The Dormouse's story
The Dormouse's story
Once upon a time there were three little sisters; and their names were
,
Lacie
and
Tillie
;
and they lived at the bottom of a well.
...
4.4 직접 상위 노드
p = soup.p
print p.parent.name
result:
body
content = soup.head.title.string
print content
print content.parent.name
result:
The Dormouse's story
title
4.5 모든 상위 노드
.parents
속성, 얻은 것도 하나의 생성기content = soup.head.title.string
print content
for parent in content.parents:
print parent.name
result:
The Dormouse's story
title
head
html
[document]
4.6 형제결점
.next_sibling
와 .previous_sibling
의 속성은 각각 다음 형제 결점을 획득하는 것과 이전 형제 결점을 획득하는 것이다.print soup.p.next_sibling
print soup.a.previous_sibling
print soup.p.next_sibling.next_sibling
result:
Once upon a time there were three little sisters; and their names were
<p class="story">Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
4.7 모든 형제의 결점
.next_siblings
와 .previous_siblings
는 현재의 형제 결점을 교체하여 출력할 수 있다for next in soup.a.next_siblings:
print next
result:
,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>
and
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
;
and they lived at the bottom of a well.
4.8 전후 요소
.next_element
과.previous_element
속성은 차원을 구분하지 않는 전후원소(같은 층의 것을 형제결점이라고 부른다)를 획득한 것이다<head><title>The Dormouse's story</title></head>
<title>The Dormouse's story</title>
4.9 모든 전후 요소
.next_elements
및 .previous_elements
속성은 문서 컨텐트를 앞뒤로 처리합니다.soup = BeautifulSoup(html,features="lxml")
for element in soup.a.next_elements:
print(repr(element))
result:
u' Elsie '
u',
'
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>
u'Lacie'
u' and
'
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>
u'Tillie'
u';
and they lived at the bottom of a well.'
u'
'
<p class="story">...</p>
u'...'
u'
'
u'
'
u'
'
5. 문서 트리 검색
5.1 find_all
find_all(name,attrs,recursive,text,**kwargs)
print soup.find_all('a')
result:
[, Lacie, Tillie]
import re
for tag in soup.find_all(re.compile("^b")):
print tag
result:
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
</body>
<b>The Dormouse's story</b>
print soup .find_all(["a","b"])
result:
[<b>The Dormouse's story</b>, <a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
for tag in soup.find_all(True):
print tag.name
result:
html
head
title
body
p
b
p
a
a
a
p
def has_class_but_no_id(tag):
return tag.has_attr('class') and not tag.has_attr('id')
print soup.find_all(has_class_but_no_id)
result:
[<p class="title" name="dromouse"><b>The Dormouse's story</b></p>, <p class="story">Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>,
<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
<a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>, <p class="story">...</p>]
soup_findall(id='link2')
.import re
print soup.find_all(id='link2')
print soup.find_all(href=re.compile("elsie"))
result:
[<a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>]
print soup.find_all(class_="sister")
result:
[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
soup.find_all(attrs={"data-foo":"value"})
data_soup = BeautifulSoup('foo!',features="lxml")
print data_soup.find_all(attrs={"data-foo":"value"})
result:
[<div data-foo="value">foo!</div>]
import re
print soup.a
print soup.find_all(text="Elsie")
print soup.find_all(text=["Tillie", "Elsie", "Lacie"])
print soup.find_all(text="story")
print soup.find_all(text="The Dormouse's story")
print soup.find_all(text=re.compile("story"))
result:
<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>
[]
[u'Lacie', u'Tillie']
[]
[u"The Dormouse's story", u"The Dormouse's story"]
[u"The Dormouse's story", u"The Dormouse's story"]
print soup.find_all("a")
print "==============="
print soup.find_all("a",limit=2)
result:
[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
===============
[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]
print soup.body
print "==============================="
print soup.body.find_all("a",recursive=False)
print soup.body.find_all("a")
result:
The Dormouse's story
Once upon a time there were three little sisters; and their names were
,
Lacie and
Tillie;
and they lived at the bottom of a well.
...
===============================
[]
[, Lacie, Tillie]
이 예에서 a 라벨은 모두 p 라벨 안에 있기 때문에 바디의 직접 하위 노드에서 a 라벨을 검색하면 a 라벨과 일치하지 않습니다.
5.2 find
find(name,attrs,recursive,text,**kwargs)
print soup.body.find_all("a")
print soup.body.find("a")
result:
[<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>]
<a class="sister" href="http://example.com/elsie" id="link1"><!-- Elsie --></a>
5.3 find_parents find_parent
print soup.body.find_all("a")
print soup.body.find("a")
print soup.body.find_parents("a")
print soup.body.find_parents("html")
result:
[, Lacie, Tillie]
[]
[The Dormouse's story
The Dormouse's story
Once upon a time there were three little sisters; and their names were
,
Lacie and
Tillie;
and they lived at the bottom of a well.
...
]
5.4 find_next_siblings 및findnext_sibling
5.5 find_previous_siblings 및findprevious_sibling
5.6 find_all_next 및findnext
이 내용에 흥미가 있습니까?
현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:
python은 어떻게 파충류의 효율을 향상시킬 것인가단일 스레드 + 멀티태스킹 비동기 협동 협정 함수(특수 함수)를 정의할 때 async 수식을 사용합니다. 함수 호출 후 내부 문장은 즉시 실행되지 않고 협동 대상으로 되돌아옵니다. 퀘스트 대상 작업 대상 = 고급 협...
텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.