파이썬으로 일부 밈 훔치기

17028 단어 memes python scraping html

안녕하세요 D

나는 밈을 좋아하고 내 휴대폰에 보관하고 싶습니다. 해결책은 밈을 탐색하고 수동으로 다운로드하는 것입니까?

아니 파이썬으로 자동으로 일부 밈을 훔치거나 다운로드하자
어떤 웹사이트에서 밈을 훔칠까요? 🎯 ?

우리의 목표는 https://imgflip.com/

먼저 html 페이지를 보자

<div class="name">
BLA BLAB 
BLAB LBA
BL BLA

<img src="MEME URL">

<div class="base-img-wrap">.......</div>의 모든 밈 링크

여기에서 base-img-wrap 클래스 이름으로 div 태그를 구문 분석하고 이 div에서 <img> 태그를 가져와야 합니다.

<div class='base-img-wrap'>
BLA BLA LBA
<img src="MEME LINK">
</div>

필요한 모듈

요청(http/s 요청용)

bs4(html 구문 분석)

이 사이트에 http 요청을 보내고 base-img-wrap 클래스를 구문 분석하여 작업을 시작하겠습니다.

import requests
from bs4 import BeautifulSoup

req = requests.get('https://imgflip.com/?page=1').content
soup = BeautifulSoup(req, "html.parser")
ancher = soup.find_all('div', {'class': "base-unit clearfix"})

"""
<div class="base-unit clearfix"><h2 class="base-unit-title"><a href="/i/5aq7jq">Why is my sister's name Rose</a></h2><div class="base-img-wrap-wrap"><div class="base-img-wrap" style="width:440px"><a class="base-img-link" href="/i/5aq7jq" style="padding-bottom:105.90909090909%"> ......
"""

<div class='base-img-wrap'>의 모든 데이터를 가져왔습니다.

img 태그를 받아보자

import requests
from bs4 import BeautifulSoup

r = requests.get('https://imgflip.com/?page=1').content
soup = BeautifulSoup(req, "html.parser")
ancher = soup.find_all('div', {'class': "base-unit clearfix"})

for pt in ancher:
    img = pt.find('img', {'class': 'base-img'})
    if img:
        print(img)

 <img alt="Why is my sister's name Rose |  people that upvote good memes instead of just scrolling past them | image tagged in why is my sister's name rose | made w/ Imgflip meme maker" class="base-img" src="//i.imgflip.com/5aq7jq.jpg"/>
<img alt="Petition: upvote if you want a rule against upvote begging. I will then post the results in the Imgflip suggestion stream |  Upvote begging will keep happening as long as they make it to the front page; UPVOTE BEGGING TO DESTROY UPVOTE BEGGING | image tagged in memes,the scroll of truth,no no hes got a point,you have become the very thing you swore to destroy,memes | made w/ Imgflip meme maker" class="base-img" src="//i.imgflip.com/5aqvx4.jpg"/>

멋진, 모든 img 태그가 있다는 것을 알고 있습니다. src 값을 가져와야 한다는 것을 알고 있습니다.

import requests
from bs4 import BeautifulSoup

r = requests.get('https://imgflip.com/?page=1').content
soup = BeautifulSoup(req, "html.parser")
ancher = soup.find_all('div', {'class': "base-unit clearfix"})

for pt in ancher:
    img = pt.find('img', {'class': 'base-img'})
    if img:
        link = img['src'].replace(img['src'][0:2],'https://')
        print(link)

"""
https://i.imgflip.com/5aq7jq.jpg
https://i.imgflip.com/5aqvx4.jpg
https://i.imgflip.com/5aq5jg.jpg
https://i.imgflip.com/5aor2n.jpg
https://i.imgflip.com/5amt83.jpg
https://i.imgflip.com/5ayodd.jpg
https://i.imgflip.com/5awhgz.jpg
https://i.imgflip.com/5allij.jpg
https://i.imgflip.com/5aosh7.jpg
https://i.imgflip.com/5amxbo.jpg
https://i.imgflip.com/5auvpo.jpg
"""

모든 이미지를 얻은 후 요청 모듈과 함께 다운로드하고 저장합니다.

import requests
from bs4 import BeautifulSoup

req = requests.get('https://imgflip.com/?page=1').content
soup = BeautifulSoup(req, "html.parser")
ancher = soup.find_all('div', {'class': "base-unit clearfix"})

for pt in ancher:
    img = pt.find('img', {'class': 'base-img'})
    if img:
        link = img['src'].replace(img['src'][0:2],'https://')
        r = requests.get(link)
        f = open(img['src'].split('/')[3],'wb') # write binary
        f.write(r.content)
        f.close()

훌륭합니다. 페이지 번호1의 모든 밈을 얻었습니다. URL에 page에 대한 매개변수를 추가해 보겠습니다.

import requests
from bs4 import BeautifulSoup
def meme_stealer(page):
    req = requests.get(f'https://imgflip.com/?page={page}').content
    soup = BeautifulSoup(req, "html.parser")
    ancher = soup.find_all('div', {'class': "base-unit clearfix"})
    for pt in ancher:
        img = pt.find('img', {'class': 'base-img'})
        if img:
            link = img['src'].replace(img['src'][0:2],'https://')
            r = requests.get(link)
            f = open(img['src'].split('/')[3],'wb')
            f.write(r.content)
            f.close()

for i in range(1,6):
    meme_stealer(i)
# Page 1
# Page 2
# Page 3
# Page 4
# Page 5

읽어주셔서 감사합니다
안녕 :D

Reference

이 문제에 관하여(파이썬으로 일부 밈 훔치기), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://dev.to/knassar702/steal-some-memes-with-python-lpe

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

[Python] 디스크 컨트롤러

스스로 만든 최면술

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다