이미지로 Google 검색을 사용하여 고해상도 이미지를 다운로드하기 위한 셀레늄이 포함된 Python 패키지

25190 단어 selenium python pip google

고해상도

이미지로 Google 검색 기능을 사용하여 고해상도 이미지를 다운로드하는 Python 패키지

전제 조건:

지침은 다음과 같습니다.

pip 셀레늄 패키지가 포함된 Python

Webdriver가 포함된 브라우저

pip를 통해 셀레늄 설치

pip install selenium

웹드라이버 설정

공식 웹 사이트에서 드라이버를 클릭하고 다운로드하십시오.

Chrome

Firefox

Edge

시스템 경로에 webdriver 폴더 경로 추가

다운로드한 드라이버 폴더 경로를 시스템 경로에 추가합니다.

할 것

이제 코딩을 시작하기 전에 목표를 더 작은 구성 요소로 세분화하겠습니다.

다음에 대한 코드 설정:

입력 및 출력 폴더

폴더에서 이미지 파일을 식별합니다.

브라우저 감지 및 실행

Google의 '이미지로 검색' 탐색 및 사용

URL을 기반으로 이미지를 다운로드하고 폴더에 저장합니다

pip 패키지 생성

입력 및 출력 폴더용 코드

일반적으로 문자열 변수를 사용하여 경로를 저장할 수 있지만 Path에서 pathlib를 사용하는 것은 OS 중립적이며 많은 유용한 기능이 있습니다. 경로 결합과 마찬가지로 문자열과 마찬가지로 / 대신 + "//" +를 사용할 수 있습니다.

그래서, 그것을 가져올 수 있습니다.

import os
from pathlib import Path
# this is come as command line input for completed code.
images_folder_path = Path(r'/home/dr/Pictures/low-res')
highres_folder_name = Path(r'highres')
success_folder_name = Path(r'old')
error_folder_name = Path(r'error')

if not os.path.exists(images_folder_path / highres_folder_name):
    os.mkdir(images_folder_path / highres_folder_name)
if not os.path.exists(images_folder_path / success_folder_name):
    os.mkdir(images_folder_path / success_folder_name)
if not os.path.exists(images_folder_path / error_folder_name):
    os.mkdir(images_folder_path / error_folder_name)

image_extensions = [".jpeg", ".jpg", ".png", ".bmp"]

if not os.path.exists(images_folder_path):
    raise SystemExit("{} is a invalid folder".format(images_folder_path))

폴더에서 이미지 파일을 식별합니다.

Path에서 pathlib는 이전에 언급했듯이 많은 기능이 있습니다.
그들 중 하나는 파일 확장자를 쉽게 얻고 있습니다.
처럼

Path('/home/dr/Pictures/image.jpeg').suffix # returns ".jpeg"

# lets get all the images in the folder
files = os.listdir(images_folder_path)

# lets filter all the files having this extensions,
# using list comprehension here
image_extensions = [".jpeg", ".jpg", ".png", ".bmp"]

files = [file for file in files if Path(file).suffix.lower() in image_extensions]
if len(files) == 0:
    raise SystemExit("No images found in the input folder")

브라우저 감지 및 코드 정의

일반적으로 다음 세 줄로 간단하게 웹페이지를 시작할 수 있습니다...

from selenium import webdriver
browser = webdriver.Firefox()
browser.get("https://dev.to")

그러나 우리는 가장 널리 사용되는 브라우저를 지원하고자 합니다. 따라서 브라우저로 실행을 시도하고 오류가 발생하면 다음 브라우저를 확인하려고 합니다.

browser = None
url = r"https://google.com/images"

if browser is None:
    try:
        browser = webdriver.Chrome()
        browser.get(google_images_url)
    except (Exception) as err:
        print(err)
if browser is None:
    try:
        browser = webdriver.Firefox()
        browser.get(google_images_url)
    except (Exception) as err:
        print(err)
if browser is None:
    try:
        browser = webdriver.Safari()
        browser.get(google_images_url)
    except (Exception) as err:
        print(err)
if browser is None:
    try:
        browser = webdriver.Edge()
        browser.get(google_images_url)
    except (Exception) as err:
        print(err)
if browser is None:
    raise SystemExit(
        "Browser is not available or Webdriver is not set up properly")

Google 이미지 검색에서 탐색

이제 이를 위해 클릭하기 전에 몇 가지 요소가 완전히 로드될 때까지 기다립니다.
이러한 이유로 나중에 재사용할 재사용 가능한 메서드를 정의할 것입니다.

def wait_for_element(driver, byType, byValue, maxWait: int):
    """
    This method, takes properties of an element and looks for the element in page.
    if the element is found, it will return the element else None is returned

    Args:
        driver ([webdriver]): [Selenium webdriver]
        byType ([By Type]): [Which property we are using to get element. ex: By.XPATH or BY.Id]
        byValue ([type]): [value of property]
        maxWait (int): [wait for element to be found]

    Returns:
        [webdriverelement]: [identified element]
    """
    try:
        element = WebDriverWait(driver, maxWait).until(
            EC.presence_of_element_located((byType, byValue)))
        return element
    except TimeoutException:
        print("Element not found")
        return None

이제 탐색을 위한 실제 코드에서는 요소를 가져오고 요소를 하나씩 클릭합니다.

dr.get(url)
print("working on file:", file)
file_name = Path(file)
image_icon = wait_for_element(dr,
                              By.XPATH, "/html/body/div[2]/div[2]/div[2]/form/div[2]/div[1]/div[1]/div/div[3]/div[2", 10)
image_icon.click()
upload_image_link = wait_for_element(dr,
                                     By.XPATH, "/html/body/div[2]/div[2]/div[2]/div/div[2]/form/div[1]/div/a", 10)
upload_image_link.click()
browse_image_button = wait_for_element(dr,
                                       By.XPATH, '//*[@id="awyMjb"]', 10)
browse_image_button.send_keys(str(images_folder_path) + "/" + file)
try:
    all_sizes_button = wait_for_element(dr,
                                            By.XPATH, '/html/body/div[7]/div[2]/div[10]/div[1]/div[2]/div/div[2]/div[1]div/div[1]/div[2]/div[2]/span[1]/a', 10)
    all_sizes_button.click()
    first_image = wait_for_element(dr,
                                       By.XPATH, '/html/body/div[2]/c-wiz/div[3]/div[1]/div/div/div/div/div[1]/div[1]/di[1]/a[1]/div[1]/img', 10)
    first_image.click()
    time.sleep(wait_time)
    preview_image_link = wait_for_element(dr,
                                              By.XPATH, '/html/body/div[2]/c-wiz/div[3]/div[2]/div[3]/div/div/div[3]/di[2]/c-wiz/div[1]/div[1]/div/div[2]/a/img', 10)
    src = preview_image_link.get_attribute('src')

이미지 파일 저장

import urllib.request

urllib.request.urlretrieve(
    src, images_folder_path / highres_folder_name / file_name)
# os.renames is used to move the file, which will automatically create folders if not exits already.
os.renames(images_folder_path / file_name,
           images_folder_path / success_folder_name / file_name)
print("success for file:", file)

핍 패키지 생성

이를 위해 스크립트가 명령줄에서 실행되기를 원하므로 위의 코드를 적절한 함수에 넣고 handler 메서드를 만든 다음 모듈에 넣고 다음과 같이 명령줄 진입점을 만듭니다.

참고: 단순화를 위해 함수 정의 및 argparse 코드는 여기에 포함되지 않으며 Github에서 완성된 코드는 마지막에 링크됩니다.

# command_line.py
from highresmodule import highres

def main():
    highres.highres_handler()

마지막으로 가장 중요한 pip 설정 코드인 내장 설정 도구를 사용하여 다음과 같이 pip 패키지를 생성합니다.

# setup.py
import os
import setuptools


def read(fname):
    # for reading the description
    return open(os.path.join(os.path.dirname(__file__), fname)).read()


setuptools.setup(
    name='highres',
    version='1.0.3',
    author="Dilli Babu R",
    author_email="[email protected]",
    description='A script to download hi-resolution images using google search by image feature',
    long_description=read('pip-readme.rst'),
    long_description_contect_type="text/markdown",
    url="https://dillir07.github.io/highres/",
    packages=['highresmodule'],
    install_requires=['selenium'], #specifying dependency
    entry_points={
        "console_scripts": ['highres=highresmodule.command_line:main']
    },
    classifiers=[
        "Programming Language :: Python :: 3",
        "Operating System :: OS Independent",
    ],
)

이제 마지막으로 다음 명령을 사용하여 로컬에서 패키지를 만들고 설치할 수 있습니다.

python setup.py develop

문제가 없으면 pip package creation의 지침에 따라 pypi 레지스트리에 업로드할 수 있습니다.

최종 코드는 Github에서, 데모는 highres에서 확인할 수 있습니다.

usaul 의견을 환영하므로 읽어 주셔서 감사합니다 :)

Reference

이 문제에 관하여(이미지로 Google 검색을 사용하여 고해상도 이미지를 다운로드하기 위한 셀레늄이 포함된 Python 패키지), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://dev.to/dillir07/a-python-package-with-selenium-to-download-high-res-image-using-google-search-by-image-6ok

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

6.13.37에서 6.13.39로 ts 도구 모음 확대

작성 모드 - 작성자 모드

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다