scrapy 쿠거T500 음악 기어오르기

9067 단어

scrapy 쿠거T500 음악 기어오르기

  • 시작
  • 코드의 작성
  • 일을 시작하다


    1、프로젝트 scrapy startproject kugouScrapy 만들기 2、spider cd kugou scrapy genspider kugouww.kugou.com3、프로젝트 파일 덮어쓰기 settings에 추가
    ROBOTSTXT_OBEY = False  #      ,        ,         
    COOKIES_ENABLED = False
    DOWNLOAD_DELAY = 0.25    # 250 ms of delay
    DEFAULT_REQUEST_HEADERS = {
      'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
      'Accept-Language': 'en',
    }
    

    3. middle에 랜덤 User-Agent 추가. 내가 사용하는 내가 관리하는 에이전트 풀. 여기에는 랜덤 에이전트를 쓰지 않고 랜덤 User-Agent를 쓴다.middlewares에 있습니다.py에서 랜덤 헤더를 작성합니다.
    
    class RandomUserAgentMiddleware():
        def __init__(self):
                  self.user_agents =  ["Mozilla/5.0 (iPod; U; CPU iPhone OS 4_3_2 like Mac OS X; zh-cn)    AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8H7 Safari/6533.18.5",
        "Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_2 like Mac OS X; zh-cn) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8H7 Safari/6533.18.5",
        "MQQBrowser/25 (Linux; U; 2.3.3; zh-cn; HTC Desire S Build/GRI40;480*800)",
        "Mozilla/5.0 (Linux; U; Android 2.3.3; zh-cn; HTC_DesireS_S510e Build/GRI40) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1",
        "Mozilla/5.0 (SymbianOS/9.3; U; Series60/3.2 NokiaE75-1 /110.48.125 Profile/MIDP-2.1 Configuration/CLDC-1.1 ) AppleWebKit/413 (KHTML, like Gecko) Safari/413",
        "Mozilla/5.0 (iPad; U; CPU OS 4_3_3 like Mac OS X; zh-cn) AppleWebKit/533.17.9 (KHTML, like Gecko) Mobile/8J2",
        "Mozilla/5.0 (Windows NT 5.2) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.122 Safari/534.30",
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_2) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.202 Safari/535.1",
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_2) AppleWebKit/534.51.22 (KHTML, like Gecko) Version/5.1.1 Safari/534.51.22",
        "Mozilla/5.0 (iPhone; CPU iPhone OS 5_0 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A5313e Safari/7534.48.3",
        "Mozilla/5.0 (iPhone; CPU iPhone OS 5_0 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A5313e Safari/7534.48.3",
        "Mozilla/5.0 (iPhone; CPU iPhone OS 5_0 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A5313e Safari/7534.48.3",
        "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.202 Safari/535.1",
        "Mozilla/5.0 (compatible; MSIE 9.0; Windows Phone OS 7.5; Trident/5.0; IEMobile/9.0; SAMSUNG; OMNIA7)",
        "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; XBLWP7; ZuneWP7)",
        "Mozilla/5.0 (Windows NT 5.2) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.122 Safari/534.30",
        "Mozilla/5.0 (Windows NT 5.1; rv:5.0) Gecko/20100101 Firefox/5.0",
        "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.2; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET4.0E; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; .NET4.0C)",
        "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET4.0E; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; .NET4.0C)",
        "Mozilla/4.0 (compatible; MSIE 60; Windows NT 5.1; SV1; .NET CLR 2.0.50727)",
        "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)",
        "Opera/9.80 (Windows NT 5.1; U; zh-cn) Presto/2.9.168 Version/11.50",
        "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)",
        "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET4.0E; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; .NET4.0C)",
        "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN) AppleWebKit/533.21.1 (KHTML, like Gecko) Version/5.0.5 Safari/533.21.1",
        "Mozilla/5.0 (Windows; U; Windows NT 5.1; ) AppleWebKit/534.12 (KHTML, like Gecko) Maxthon/3.0 Safari/534.12",
        "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; TheWorld)"]
    
        def process_request(self, request, spider):
            request.headers['User-Agent'] = random.choice(self.user_agents)
    

    그리고 settings에 추가:settings:DOWNLOADERMIDDLEWARES = { #‘kugouScrapy.middlewares.KugouscrapyDownloaderMiddleware’: 543, ‘kugouScrapy.middlewares.RandomUserAgentMiddleware’: 543, }

    코드 작성


    1. Spider 파일 수정
    
    class KugouSpider(scrapy.Spider):
        name = 'kugou'
        allowed_domains = []
        start_urls = []
            #  start_requests    
        def start_requests(self):
            for i in range(STRAT_PAGE, END_PAGE):
                URL = 'http://mobilecdngz.kugou.com/api/v3/rank/song?rankid=8888&ranktype=2&page=%s&pagesize=30&volid=&plat=2&version=8955&area_code=1' % i
                yield scrapy.Request(url=URL, callback=self.parse)#wuy
    
        def parse(self, response):
            Song_Json = json.loads(response.body)
            Song_List_Json = Song_Json['data']['info']
            items=[]
            for i in range(len(Song_List_Json)):
                song_download_url = "http://www.kugou.com/yy/index.php?r=play/getdata&hash=%s&album_id=%s&_=1523819864065" % (Song_List_Json[i]['hash'], Song_List_Json[i]['album_id'])
                item=KugouscrapyItem()
                item["music_name"]=Song_List_Json[i]['filename'].split("-")[0]
                item["music_singer"]=Song_List_Json[i]['filename'].split("-")[1]
                item["music_url"]=''
                items.append(item)#    
    
                yield scrapy.Request(url=song_download_url, meta={'item': item}, callback=self.get_song_download_url)
    
        def get_song_download_url(self, response):
            item = response.meta['item']
            res_tmp_list = json.loads(response.body)
            item["music_url"]=res_tmp_list['data']['play_url']
            yield item
    

    2. items 파일을 작성하는 Item은 파일이 데이터를 추출하는 것을 저장하는 용기로서 사용 방법은 사전과 같다.Item을 작성하려면 Scrapt를 상속해야 합니다.Item 클래스, 유형 정의는 scrapy.Field 필드.
    class KugouscrapyItem(scrapy.Item):
        # define the fields for your item here like:
        collection = 'kugou_song'
        music_name = scrapy.Field()
        music_singer=scrapy.Field()
        music_url=scrapy.Field()
    

    3、pipelines에서 음악을 로컬에 저장
    headers={
            'UserAgent' : 'Mozilla/5.0 (iPhone; CPU iPhone OS 5_0 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A334 Safari/7534.48.3',
            'Referer' : 'http://m.kugou.com/rank/info/8888',
            'Cookie' : 'UM_distinctid=161d629254c6fd-0b48b34076df63-6b1b1279-1fa400-     161d629255b64c; kg_mid=cb9402e79b3c2b7d4fc13cbc85423190; Hm_lvt_aedee6983d4cfc62f509129360d6bb3d=1523818922; Hm_lpvt_aedee6983d4cfc62f509129360d6bb3d=1523819865; Hm_lvt_c0eb0e71efad9184bda4158ff5385e91=1523819798; Hm_lpvt_c0eb0e71efad9184bda4158ff5385e91=1523820047; musicwo17=kugou'
            }
    class KugouscrapyPipeline(object):
        def process_item(self, item, spider):
    
    
            file_path = '%s/%s%s.%s' % (MUSIC_URL, item["music_name"], item["music_singer"],item["music_url"].split('.')[-1])
    
            if os.path.exists(file_path):
                pass
            else :
                with open(file_path, 'wb') as handle:
                    response = requests.get(url=item["music_url"],headers=headers)
                    for block in response.iter_content(1024):
                        if not block:
                            break
                        handle.write(block)
            return item
    

    4, Mongodb 1 에 음악 정보를 저장) setting.py 파일 추가 구성 정보 MONGOURL = ‘127.0.0.1’ MONGO_DB = ‘kugou’ ITEM_PIPELINES = { ‘kugouScrapy.pipelines.KugouscrapyPipeline’: 300, ‘kugouScrapy.pipelines.MongoPipeline’: 400, }
    2) items에 collection ='kugou 추가Song'3) 추가 코드
    class MongoPipeline(object):
        def __init__(self, mongo_uri, mongo_db):
            self.mongo_uri = mongo_uri
            self.mongo_db = mongo_db
    
        @classmethod
        def from_crawler(cls, crawler):
            return cls(mongo_uri=crawler.settings.get('MONGO_URI'), mongo_db=crawler.settings.get('MONGO_DB'))
    
        def open_spider(self, spider):
            self.client = pymongo.MongoClient(self.mongo_uri)
            self.db = self.client[self.mongo_db]
    
        def process_item(self, item, spider):
            self.db[item.collection].insert(dict(item))
            return item
    
        def close_spider(self, spider):
            self.client.close()
    

    마지막 추출: scrapy crawl kugou 코드 분석은 다음과 같습니다.https://download.csdn.net/download/huangwencai123/11142791

    좋은 웹페이지 즐겨찾기