파충류: 주식 역사 거래 데이터 얻 기

파충류 전략 수립
1. 동방 부 망 에서 획득 (http://quote.eastmoney.com/stocklist.html) 주식 코드 2. 왕 이 재 경 에서 csv 형식 파일 을 직접 다운로드 할 수 있 습 니 다. 주 소 는 유사 합 니 다.http://quotes.money.163.com/trade/lsjysj_600508. html \ # 01b 07 3. 두 사이트 모두 쿠키 가 필요 하지 않 습 니 다. 오 르 기 쉽 습 니 다. 방문 시간 간격 을 조절 하면 됩 니 다. 오 르 는 정 보 는 너무 폭력 적 이지 않 습 니 다. 4. 동방 자산 망 에서 얻 은 주식 코드 중 많은 것 이 펀드 의 코드 (예 를 들 어 1, 5, 등 시작) 입 니 다. 이런 펀드 는 왕 이 재 경 에서 데 이 터 를 얻 을 수 없 기 때문에 이런 주식 코드 를 반 해 야 합 니 다.
주식 코드 획득
여 기 는 beautifulsoup 을 사용 해 야 합 니 다. pip install bs4

모든 주식 코드 는

        라벨 아래  라벨 에 있다.가 져 온 주식 코드 는 로 컬 txt 에 저장 되 거나 redis 데이터 베이스 에 저장 되 어 분포 식 으로 오 르 거나 중간 에 다시 오 르 는 것 을 중단 합 니 다.
redis 설치 여 기 를 볼 수 있 습 니 다:https://blog.csdn.net/tonydz0523/article/details/82493480 
코드 는 다음 과 같 습 니 다:import requests
import random
from bs4 import BeautifulSoup as bs
import time
import redis

def get_stock_names():
    """
                     ,   redis      txt  
    """
    rds = redis.from_url('redis://:[email protected]:6379', db=1, decode_responses=True)   #   redis db1

    url = "http://quote.eastmoney.com/stocklist.html"
    headers = {
            'Referer': 'http://quote.eastmoney.com',
            'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36'
        }

    response = requests.get(url, headers=headers).content.decode('gbk')   #      gbk     
    soup = bs(response, 'lxml')
    all_ul = soup.find('div', id='quotesearch').find_all('ul')   #     ul     
    with open('stock_names.txt', 'w+', encoding='utf-8') as f:  
        for ul in all_ul:
            all_a = ul.find_all('a')            #   ul      a   
            for a in all_a:
                rds.rpush('stock_names', a.text)       # a.text  a    text    rpush          
                f.write(a.text + '
')
  
redis 의 데이터: 
주식 기록 데이터 획득
웹 페이지 분석:
  
여기 서 주식 의 시간 정 보 를 얻 을 수 있 습 니 다.  & name="date_start_type" 태그 의 value 값stock_url = 'http://quotes.money.163.com/trade/lsjysj_{}.html'.format(stock_code)
respones = requests.get(stock_url, headers=headers).text
soup = bs(respones, 'lxml')
start_time = soup.find('input', {'name': 'date_start_type'}).get('value').replace('-', '')
end_time = soup.find('input', {'name': 'date_end_type'}).get('value').replace('-', '')
  
우 리 는 다운 로드 를 클릭 하여 다음 그림 과 같이 응답 합 니 다. 
이 를 통 해 알 수 있 듯 이 우 리 는 code, start, end 매개 변 수 를 알 면 데 이 터 를 얻 을 수 있 습 니 다. 또한 이곳 의 code 는 우리 가 얻 은 것 과 약간의 차이 가 있 습 니 다. 이곳 의 code 상하 이 시의 앞 에는 0 이 많 고 선전 시의 앞 에는 모두 1 이 있 습 니 다.
코드 는 다음 과 같 습 니 다:def get_data():
    headers = {
        'Referer': 'http://quotes.money.163.com/',
        'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.81 Safari/537.36'
    }
    while True:
        stock_name = rds.lpop('stock_names')    # redis       
    # for stock_name in stock_names:
        if stock_name:
            try:
                stock_code = stock_name.split('(')[1].split(')')[0]
                #                    ，      ，       。
                #      6,9  ，   0,2,3  ，        2  ，201/202/203/204      
                #     data            0，     1
                if int(stock_code[0]) in [0, 2, 3, 6, 9]:
                    if int(stock_code[0]) in [6, 9]:
                        stock_code_new = '0' + stock_code
                    elif int(stock_code[0]) in [0, 2, 3]:
                        if not int(stock_code[:3]) in [201, 202, 203, 204]:
                            stock_code_new = '1' + stock_code
                        else: continue
                    else: continue
                else: continue

                stock_url = 'http://quotes.money.163.com/trade/lsjysj_{}.html'.format(stock_code)
                respones = requests.get(stock_url, headers=headers).text
                soup = bs(respones, 'lxml')
                start_time = soup.find('input', {'name': 'date_start_type'}).get('value').replace('-', '')    #       
                end_time = soup.find('input', {'name': 'date_end_type'}).get('value').replace('-', '')        #       
                time.sleep(random.choice([1, 2]))                                                             #         1-2 
                download_url = "http://quotes.money.163.com/service/chddata.html?code={}&start={}&end={}&fields=TCLOSE;HIGH;LOW;TOPEN;LCLOSE;CHG;PCHG;TURNOVER;VOTURNOVER;VATURNOVER;TCAP;MCAP".format(stock_code_new, start_time, end_time)
                data = requests.get(download_url, headers=headers)
                with open('stock_data/{}.csv'.format(stock_name), 'wb') as f:                                 #    
                    for chunk in data.iter_content(chunk_size=10000):
                        if chunk:
                            f.write(chunk)
                print("{}        ".format(stock_name))

            except Exception as e:
                rds.rpush('stock_names', stock_name)
                print(e)
        else:break
  
큰 성 과 를 거두다
  
이렇게 하면 데 이 터 를 로 컬 에 다운로드 할 수 있 습 니 다. 그러나 이렇게 조작 하 는 데 이 터 는 매우 불편 합 니 다. 다음 편 에 서 는 이 csv 파일 을 my sql 데이터베이스 에 가 져 오 는 방법 을 쓸 것 입 니 다.
참고:https://blog.csdn.net/pythoncodez/article/details/77623287 name="date_end_type"

이 내용에 흥미가 있습니까?

현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:

Python 파충류 (1) - 데이터 세척 및 추출

re 모듈 의 사용 추출, 일치, 교체 추출: findall () 일치: match () 교체: sub () 예: Xpath 의 기본 문법 표현 식 묘사 하 다. 루트 노드 선택 또는 하위 임의의 노드, 위치 고려 ...

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.

파충류: 주식 역사 거래 데이터 얻 기

좋은 웹페이지 즐겨찾기