K-디지털트레이닝(빅데이터) 8일차

오늘은 크롤링에 관해서 배웠다. 나도 처음 해보는거라 익숙하지 않아서 많이 해맸다. 하지만 하다보니 익숙해져서 재밌었다.

방법

selenium설치

pip install selenium

크롬 드라이버 다운로드
버전 확인후에 다운로드 해야한다. 하위버전 드라이버를 다운하는것은 상관없지만 상위버전 드라이버는 작동 안함
3.확인

from selenium import webdriver
import time
driver = webdriver.Chrome()
driver.get('URL')
time.sleep(3)  #로드 되기전에 입력되지 않도록 방지
driver.close()

크롤링할거 찾기
f12로 개발자 도구로 들어가서 클래스나 태그를 확인한다.

e = driver.fine_elements_by_class_name() #element로 하면 안찾아진다.

만약 주소 변경하고 싶으면...
driver.get('URL')

beautifulsoup를 설치해야함

pip install beautifulsoup4

네이버로 강아지 검색해서 가져오기

from selenium import webdriver
# from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
from urllib.parse import quote_plus #한글 처리를 위해서
import time


# baseUrl = 'https://www.google.com/search?q='
baseUrl = 'https://search.naver.com/search.naver?where=nexearch&sm=top_hty&fbm=0&ie=utf8&query='
plusUrl = input('검색어를 입력하세요. : ')


url = baseUrl + quote_plus(plusUrl)#quote_plus 꼭 써야함

print(url)

driver = webdriver.Chrome()
driver.get(url)

html = driver.page_source
soup = BeautifulSoup(html, 'html.parser') #html을 잘게잘라서 담겠다

f = open("a_text.txt", 'w')


# titleLists = soup.select('h3')

#
# for title in titleLists:
#     data = title.text + "\n"
#     f.write(data)

titleLists = soup.select('.api_txt_lines')
for title in titleLists:
    print(title.text)
    print(title.get('href'))


f.close()

Author And Source

이 문제에 관하여(K-디지털트레이닝(빅데이터) 8일차), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://velog.io/@y7y1h13/K-digitial-training-8일차

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다