selenium+python 1688 사이트 인증 코드 이미지 캡 처 기능 실현

9996 단어 python selenium 인증번호

1.배경
•1688 사이트 에서 데 이 터 를 추출 할 때 너무 자주 방문 하면 사용자 가 로그 인 했 든 안 했 든 다음 과 같은 인증 코드 로그 인 상자 가 팝 업 됩 니 다.
这里写图片描述

일반적인 인증 코드 는 다음 과 같은 요소 입 니 다.


<img id="J_CheckCodeImg1" width="100" height="30" onmousedown="return false;" src="//pin.aliyun.com/get_img?identity=sm-searchweb2&amp;sessionid=9c3a51d81de07ddf1bfd9bbc70863b0f&amp;type=default&amp;t=1511315617645">

•일반적으로 인증 코드 그림 을 가 져 오 는 방법 은 두 가지 가 있 습 니 다.
•첫째,위의 인증 코드 를 가 져 온 그림 링크:src="/pin.aliyun.com/getimg?idenity=sm-searchweb2&sessionid=9c3a51d81de07ddf1bfd9bbc 70863b0f&type=default&t=1511315617645"이지 만 이런 방식 은 때때로 통 하지 않 는 다.현재 인증 코드 와 추출 된 url 링크 를 통 해 열 린 인증 코드 를 발견 할 수 있 기 때문에 내용 이 다 르 고 그 내용 이 계속 변화 하기 때 문 입 니 다.
•둘째,selenium 을 이용 하여 먼저 시각 영역 캡 처 를 한 다음 에 인증 코드 요소 의 위치 와 크기 를 정 한 다음 에 Image(PIL 모듈 중)를 이용 하여 재단 하여 인증 코드 이미 지 를 얻 은 다음 에 인증 코드 모듈 이나 코드 플랫폼 으로 보 내 처리 합 니 다.
2.환경
•python 3.6.1
시스템
•IDE：pycharm
chrome 브 라 우 저 설치
chromedriver 설정
•selenium 3.7.0
3.웹 페이지 구조 분석
这里写图片描述

웹 페이지 소스 코드 를 분석 함으로써 우 리 는 다음 과 같은 결론 을 얻 을 수 있다.
•이 인증 코드 로그 인 상 자 는 iframe 을 통 해 웹 페이지 에 삽 입 된 것 입 니 다.
•페이지 에 이 iframe 플러그 인 만 있 는 것 이 아 닙 니 다.
•이 인증 코드 iframe 은 뚜렷 한 특징 이 있 습 니 다.id="sufei-dialog-content"와 src="https://sec.1688.com/query.htm?……”


<iframe id="sufei-dialog-content" frameborder="none" src="https://sec.1688.com/query.htm?style=mini&amp;smApp=searchweb2&amp;smPolicy=searchweb2-RpcAsyncAll-anti_Spider-checkcode&amp;smCharset=GBK&amp;smTag=MTIxLjE1LjI2LjIzMywzNTE1MTA4MjI5LGFlNGE1ZGI1YTQ4NDQ3NTNiYzY5OTZlZmU1OWE3Njhm&amp;smReturn=https%3A%2F%2Fs.1688.com%2Fselloffer%2Frpc_async_render.jsonp%3Fkeywords%3D%25CF%25B4%25CD%25EB%25B2%25BC%26startIndex%3D0%26n%3Dy%26pageSize%3D60%26rpcflag%3Dnew%26async%3Dtrue%26templateConfigName%3DmarketOfferresult%26enableAsync%3Dtrue%26qrwRedirectEnabled%3Dfalse%26filterP4pIds%3D1245873517%252C561786598916%252C559726907082%252C523166432402%252C557139543735%252C529784793813%252C543923733444%252C560590249743%26asyncCount%3D20%26_pageName_%3Dmarket%26offset%3D9%26uniqfield%3Dpic_tag_id%26leftP4PIds%3D%26callback%3DjQuery18305735956012709345_1511341604992%26beginPage%3D48%26_%3D1511341615310&amp;smSign=XKm5xSgAkIixvOkhV1VSyg%3D%3D" cd_frame_id_="c4ae94ef2bea60f0b4729f319df59251"></iframe>

4.코드


#    ，      ，             
from selenium import webdriver
import time
from PIL import Image
browser = webdriver.Chrome()
#          ，             ，             
browser.set_window_size(960, 960)
#        
def captchaHandler(browser, DamatuInstance):
  iframeLst = browser.find_elements_by_tag_name('iframe')
  print(f"captchaHandler: enter , iframeLst = {iframeLst}")
  for iframe in iframeLst:
    iframeID = iframe.get_attribute('id')
    iframeSrc = iframe.get_attribute('src')
    print(f"captchaHandler: iframeID = {iframeID}, iframeSrc = {iframeSrc}")
    #        iframe
    if iframeID and iframeID.find('dialog') != -1:
      if iframeSrc and iframeSrc.find(r'sec.1688.com') != -1:
        #   iframe      
        frameWidth = iframe.size['width']
        frameHeight = iframe.size['height']
        #          
        #      ，          ， iframe     
        if frameWidth > 0 and frameHeight > 0:
          print(f"     ,     , frameWidth = {frameWidth}, frameHeight = {frameHeight}")
          #   ， chrome         ，     html  
          #      project      clawerImgs  
          browser.get_screenshot_as_file('clawerImgs/screenshot.png')
          #    iframe       （        ）      ，                 960 X 960
          # location_once_scrolled_into_view               
          # location         html     
          frameX = int(iframe.location_once_scrolled_into_view['x'])
          frameY = int(iframe.location_once_scrolled_into_view['y'])
          print(f"captchaHandler: frameX = {frameX}, frameY = {frameY}, frameWidth = {frameWidth}, frameHeight = {frameHeight}")
          #         ，  iframe     
          left = frameX
          top = frameY
          right = frameX + frameWidth
          bottom = frameY + frameHeight
          #   Image    ，  frame    ――――    ，       
          imgFrame = Image.open('clawerImgs/screenshot.png')
          imgFrame = imgFrame.crop((left, top, right, bottom)) #   
          imgFrame.save('clawerImgs/iframe.png')
          #           frame，            ，         iframe 
          browser.switch_to.frame(iframe)
          # ------       ，     ： frame    
          #         
          captchaElem = browser.find_element_by_xpath("//img[contains(@id, 'CheckCodeImg')]")
          #       frame     ，             
          #             frame ，         
          captchaX = int(captchaElem.location['x'])
          captchaY = int(captchaElem.location['y'])
          #           
          captchaWidth = captchaElem.size['width']
          captchaHeight = captchaElem.size['height']
          captchaRight = captchaX + captchaWidth
          captchaBottom = captchaY + captchaHeight
          print(f"captchaHandler: 1 captchaX = {captchaX}, captchaY = {captchaY}, captchaWidth = {captchaWidth}, captchaHeight = {captchaHeight}")
          #   Image    ，     ： frame    
          imgObject = Image.open('clawerImgs/iframe.png')
          imgCaptcha = imgObject.crop((captchaX, captchaY, captchaRight, captchaBottom))   #   
          imgCaptcha.save('clawerImgs/captcha1.png')
          # ------       ，     ：         。       iframe    
          captchaElem = browser.find_element_by_xpath("//img[contains(@id, 'CheckCodeImg')]")
          captchaX = int(captchaElem.location['x']) + frameX
          captchaY = int(captchaElem.location['y']) + frameY
          captchaWidth = captchaElem.size['width']
          captchaHeight = captchaElem.size['height']
          captchaRight = captchaX + captchaWidth
          captchaBottom = captchaY + captchaHeight
          print(f"captchaHandler: 2 captchaX = {captchaX}, captchaY = {captchaY}, captchaWidth = {captchaWidth}, captchaHeight = {captchaHeight}")
          #   Image    ，     ：         
          imgObject = Image.open('clawerImgs/screenshot.png')
          imgCaptcha = imgObject.crop((captchaX, captchaY, captchaRight, captchaBottom))    #   
          imgCaptcha.save('clawerImgs/captcha2.png')

5.결과 전시
•전체 시각 영역:screenshot.png
这里写图片描述

•인증 코드 로그 인 상자 iframe 영역:iframe.png
这里写图片描述

•iframe 에서 캡 처 한 인증 코드 그림 에 비해:captcha 1.png
这里写图片描述

•전체 시각 영역 에 비해 캡 처 된 인증 코드 그림:captcha 2.png
这里写图片描述

6.확장


#   https://www.cnblogs.com/my8100/p/7225408.html
chrome
  default：
    location    ，        html    {'x': 15.0, 'y': 129.0}
    location_once_scrolled_into_view            （       ，              y   ）
        /            ，{u'x': 15, u'y': 60}
                               {u'x': 15, u'y': 0} account-wall
                               {u'x': 15, u'y': 594} theme-list
  frame：
    location    ，      frame       html   {'x': 255.0, 'y': 167.0} captcha_frame   lc-refresh
    location_once_scrolled_into_view            
             {u'x': 273, u'y': 105}
                         {u'x': 273, u'y': 0}
  firefox
    default:
               
        location    ，        html    {'x': 15.0, 'y': 130.0} {'x': 15.0, 'y': 707.0}
        location_once_scrolled_into_view            （y=1    ）
                         {'x': 15.0, 'y': 1.0} {'x': 15.0, 'y': 1.0}
                   ，             {'x': 15.0, 'y': 82.0}
    frame：
      location    ，    frame     html   {'x': 255.0, 'y': 166.0}
      location_once_scrolled_into_view               ，（'y'   166.0）
              frame     html   {'x': 255.0, 'y': 166.0}
#   
location 
       ，      html    frame   
location_once_scrolled_into_view
  chrome       ，firefox     ；  chrome         ，            。
              ，  firefox frame      frame   
#   ：https://zhuanlan.zhihu.com/p/25171554
selenium.webdriver             ，  ：
  a.WebDriver.Chrome              ，              ，        。
  b.WebDriver.PhantomJS              。

총결산
위 에서 말 한 것 은 소 편 이 소개 한 selenium+python 이 1688 사이트 인증 코드 사진 의 캡 처 기능 을 실현 하 는 것 입 니 다.여러분 께 도움 이 되 기 를 바 랍 니 다.궁금 한 점 이 있 으 시 면 저 에 게 메 시 지 를 남 겨 주세요.소 편 은 제때에 답 해 드 리 겠 습 니 다.여기 서도 저희 사이트 에 대한 여러분 의 지지 에 감 사 드 립 니 다!

이 내용에 흥미가 있습니까?

현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:

로마 숫자를 정수로 또는 그 반대로 변환

그 중 하나는 로마 숫자를 정수로 변환하는 함수를 만드는 것이었고 두 번째는 그 반대를 수행하는 함수를 만드는 것이었습니다. 문자만 포함합니다'I', 'V', 'X', 'L', 'C', 'D', 'M' ; 문자열이 ...

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.

Python 은 SMTP 프로 토 콜 을 바탕 으로 메 일 발송 기능 에 대한 상세 한 설명 을 실현 합 니 다.

django+xadmin+djcelery 배경 관리 정시 작업 실현

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다