SBI 증권 포트폴리오 페이지 스크래핑

소개

myTrade를 애용하고 있었지만, 1/9에 지원 종료라는 것으로, 여러가지 앱을 찾았습니다만, 적절한 앱이 없고, 스스로 SBI의 포트폴리오의 페이지를 스크래핑 해, Google spread sheet에서 데이터를 관리하기로 결정했습니다.
따라서 이 페이지에서는 다음 두 가지 프로그램을 소개합니다.
1. SBI 증권 포트폴리오 페이지 스크래핑
2. 스크래핑된 데이터를 Google spread sheet에 기록
↓이런 느낌에 씁니다.

환경

OS: Mac

language: python 3.7

thrid party

selenium

ChromeDriver

beautiful soap

pandas

Google 크롬

절차

환경 구축

전제:
* 파이썬이 설치되어 있습니다.
* pip가 설치되어 있습니다.

1. 필요한 모듈 설치

pip에 필요한 모듈 설치

pip install selenium
pip install pandas lxml html5lib BeautifulSoup4

ChromeDriver 설치

Google 크롬 버전에 해당하는 ChromeDriver를 다운로드하여 PATH가 통과한 곳에 놓습니다. (참고 : Mac에서 PATH 통과)
다운로드하는 ChromeDriver 버전은 사용중인 Google Chrome 버전과 일치합니다.
일치하는 것이 없으면 가장 가까운 것.

2.Google 설정
↓의 페이지가 훌륭하기 때문에 그대로 한다
htps : // 타누는 ck. 코 m / 오라 라 - sp Rea d shi / t / # Go g _ C ぉ d_P t t rm

코드

1.import

import time
import datetime
import gspread
import json
import pandas
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from bs4 import BeautifulSoup
from oauth2client.service_account import ServiceAccountCredentials

2. 스크래핑하는 부분

class Result():
    def __init__(self, fund, amount):
        self.fund = fund
        self.amount = amount

def convert_to_list(data_frame, custody):
    data_frame = data_frame.iloc[:, [1, 10]]
    data_frame.drop([0], inplace=True)
    data_frame.columns = ['funds', 'amount']

    results = []
    row_num = data_frame.shape[0]
    for i in range(row_num):
        index = i + 1
        fund = data_frame.at[index, 'funds']
        amount = data_frame.at[index, 'amount']

        results.append(Result(custody + ':' + fund, amount))

    return results

def get_stocks():
    options = Options()
    # ヘッドレスモード(chromeを表示させないモード)
    options.add_argument('--headless')
    # ChromeのWebDriverオブジェクトを作成
    driver = webdriver.Chrome(options=options)

    # SBI証券のトップ画面を開く
    driver.get('https://www.sbisec.co.jp/ETGate')

    # ユーザーIDとパスワードをセット
    input_user_id = driver.find_element_by_name('user_id')
    input_user_id.send_keys('xxxx')

    input_user_password = driver.find_element_by_name('user_password')
    input_user_password.send_keys('yyyy')

    # ログインボタンをクリックしてログイン
    # body の読み込みは非同期っぽいので少しsleepする
    driver.find_element_by_name('ACT_login').click()
    time.sleep(5)
    driver.find_element_by_link_text('ポートフォリオ').click()

    # 文字コードをUTF-8に変換
    html = driver.page_source #.encode('utf-8')

    # BeautifulSoupでパース
    soup = BeautifulSoup(html, "html.parser")

    table = soup.find_all("table", border="0", cellspacing="1", cellpadding="4", bgcolor="#9fbf99", width="100%")
    df_stocks = pandas.read_html(str(table))[0]
    stocks = convert_to_list(df_stocks, '特定')

    df_nisa = pandas.read_html(str(table))[1]
    nisa = convert_to_list(df_nisa, 'NISA')

    result = []
    for s in stocks:
        result.append(s)

    for n in nisa:
        result.append(n)

    driver.quit()
    return result

3.spread sheet에 쓰기

def write(stocks):
    scope = ['https://spreadsheets.google.com/feeds','https://www.googleapis.com/auth/drive']

    #認証情報設定
    #ダウンロードしたjsonファイル名をクレデンシャル変数に設定（秘密鍵、Pythonファイルから読み込みしやすい位置に置く）
    credentials = ServiceAccountCredentials.from_json_keyfile_name('zzzzz.json', scope)

    #OAuth2の資格情報を使用してGoogle APIにログインします。
    gc = gspread.authorize(credentials)

    #共有設定したスプレッドシートキーを変数[SPREADSHEET_KEY]に格納する。
    SPREADSHEET_KEY = 'hogehoge'

    #共有設定したスプレッドシートのシート1を開く
    worksheet = gc.open_by_key(SPREADSHEET_KEY).sheet1
    headers = worksheet.row_values(1)
    dates = worksheet.col_values(1)
    new_row_num = len(dates) + 1

    worksheet.update_cell(new_row_num, 1, datetime.datetime.today().strftime('%Y/%m/%d'))
    for stock in stocks:
        for i in range(len(headers)):
            if headers[i] == stock.fund:
                worksheet.update_cell(new_row_num, i + 1, stock.amount)

4.↑의 페이지를 조합한다

def main():
    # ポートフォリオをスクレイピングして書き込むデータを取得する
    stocks = get_stocks()
    # 取得したデータをスプレッドシートに書き込む
    write(stocks)
if __name__ == "__main__":
    main()

리스펙트 페이지:

SBI 증권 스크래핑:
htps : / / 하토. 요코하마/sc라핀 g_s비_인ゔㅇ st면 t/
파이썬 + selenium :
htps : // 코 m / 현기증 / ms / 20 02161 7 7 18d8 693

Beautiful Soup의 스크래핑 기초 요약 :
h tps:// 퀵했다. 작은 m / 우마 / MS / 896C49d46585 그림 32 f7b1

pandas:

h tps : // 시니 r케. 하테나 bぉg. 코 m / 엔트리 / 뉴몬 팬더 s

htps // 팬더 s. py였다. rg / 팬더 s-도 cs / s 타 b / 어서 x. HTML

[더 이상 망설이지 않는다] 파이썬에서 스프레드 시트에 읽고 쓰는 기본 설정 요약 : htps : // 타누는 ck. 코 m / 오라 라 - sp Rea d shi / t / # Go g _ C ぉ d_P t t rm

파이썬에서 스프레드 시트에 읽고 쓰는 기본 설정 요약 : htps : // 타누는 ck. 코 m / ぃ b 등 ry-gsp 레아 d /

gspread official: htps : // gsp 어 d. Red d. cs. 이오 / 엔 / 아 st / 가서 x. HTML