Python을 사용한 Playstore 웹 스크래핑

4457 단어

Download Jupyter Notebook Here

GitHub Reference

지난 달 조비안 CEO에게 유튜브에서 파이썬 프로그래밍을 이용해 웹사이트의 데이터를 스크랩하는 방법을 배워 웹 스크래핑 프로젝트를 했다.

그래서 개인 웹 스크래핑 프로젝트를 공유할 수 있을 것 같았습니다.

내 프로젝트는 Playstore의 데이터 스크래핑과 관련된 데이터 분석 프로젝트에 관한 것이며 나이지리아 FINTECH 비즈니스 틈새 시장을 내 초점으로 선택했습니다.

1 단계:
셀레늄 및 크롬 관리자 드라이버를 사용하여 Google Play 스토어에서 내 데이터를 스크랩했습니다.

필요한 라이브러리 가져오기

import time
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

Selenium을 사용하여 PlayStore에 액세스

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get('https://play.google.com/store/search?q=fintech%20in%20nigeria&c=apps')
time.sleep(10)

필수 페이지에서 모든 앱 가져오기

SCROLL_PAUSE_TIME = 5

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
time.sleep(SCROLL_PAUSE_TIME)

while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

앱 링크 스크래핑

links_fintech = []
elems = driver.find_elements_by_xpath("//a[@href]")
for elem in elems:
    if "details?id" in elem.get_attribute("href"):
        links_fintech.append((elem.get_attribute("href")))

links_fintech = list(dict.fromkeys(links_fintech))

각 앱에서 필요한 정보 스크랩

list_all_elements = []
for iteration in links_fintech:
    try:
        driver.get(iteration)
        print(iteration)
        time.sleep(3)

        header1 = driver.find_element_by_tag_name("h1")

        downloads = driver.find_elements_by_class_name("htlgb")
        list_downloads = []
        for x in range (len(downloads)):
            if x % 2 == 0:
                list_downloads.append(downloads[x].text)


        titles = driver.find_elements_by_class_name("AHFaub")
        comments = driver.find_element_by_class_name("EymY4b")

        list_elements = [iteration,header1.text, list_downloads.append(downloads[x].text), comments.text.split()[0]]
        for x in range (len(titles)):
            if titles[x].text == "Download":
                list_elements.append(list_others[x])
            if titles[x].text == "Developer":
                for y in list_others[x].split("\n"):
                    if "@" in y:
                        list_elements.append(y)
                        break

        list_all_elements.append(list_elements)
    except Exception as e:
        print(e)

스크랩한 DataFrame용 CSV 파일 생성

import pandas as pd

df = pd.DataFrame(list_all_elements,columns=['URL', 'Name', 'downloads', 'install']) 
df_1 = df.to_csv('fintech_playstore.csv', header = True, index=False, encoding="utf-8")

df_1

내 GitHub 참조에서 데이터 세트를 다운로드할 수 있습니다.

그런 다음 PowerBI를 사용하여 데이터를 시각화했습니다.

Reference 01

Reference 02

Reference

이 문제에 관하여(Python을 사용한 Playstore 웹 스크래핑), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://dev.to/designegycreatives/playstore-web-scraping-with-python-3o3o

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다