【개인 개발】랭킹 자동 생성 사이트(Python/Django)

19150 단어 MySQL 개인 개발 파이썬 장고 GoogleMapsAPI

하고 싶었던 일

개인으로 WEB 사이트를 작성해, HTML·CSS·JavaScript를 사용해 왔습니다만, 갱신 작업은 귀찮은 부분도 있습니다. 자동으로 내용이 갱신되는 WEB 사이트를 작성하고 싶다고 생각했습니다.

사이트 개요

「세계의 절경 100선」등을 소개하고 있는 사이트에 나오는 절경에 대해서, 빈출도, google 검색 결과, 독자 포인트를 부여해, 세계의 절경 랭킹을 자동으로 생성한다. (갱신 빈도:1일 1회 or 2회)

1. 인기 세계의 절경 10
절경 정리 사이트에서 자주 나오는 절경 + 자신 포인트를 부여 + google의 검색 결과를 가미한다

2. 알려지지 않은 세계의 절경 10
절경 정리 사이트에서 쓰고는 있지만, 그렇게 나오지 않는 절경을 랜덤 표시 + google의 검색 결과의 하위순서를 가미

3. Google MAP API를 사용하여 각 순위의 상위 5 개의 절경 위치에 고정

공개 사이트

렌탈 서버

에 6

AWS (현재 업데이트 중지 중)

사용 기술

Python3.6에서 스크래핑 (BeautifulSoup)

Django2.0 + MySQL

Google Map API v3

Google Custom API

※Python에 의한 스크래핑·Django도 처음 만졌습니다.

절차 및 포인트

1: 순위 만들기(Python)

①세계의 절경을 소개하고 있는 페이지를 5~10사이트 정도, 파이썬에서 스크래핑하고, 세계의 절경의 명칭을 취득(갱신 빈도: 하루 1회)하고, DB(MySQL)에 저장한다.

스크래핑 예:

import urllib.request, urllib.error
from bs4 import BeautifulSoup

def getURL(urlInfo, tagInfo):
    url = urlInfo

    # ヘッダーセット
    ua = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36'
    req = urllib.request.Request(url, headers={'User-Agent': ua})

    try:
        # URLにアクセスする htmlが帰ってくる
        html = urllib.request.urlopen(req)
        # htmlをBeautifulSoupで扱う
        soup = BeautifulSoup(html, 'html.parser')
        # 要素全てを摘出する
        return soup.find_all(tagInfo)
    except:
        pass

def makeList(tagInfo, findInfo, urlInfo):

    # 取得結果をループし文字列を取得
    for t1 in tagInfo:

        text = t1.string
        if text is None:
            pass
        else:
            if findInfo is None:
                list.append([text, urlInfo])
            else:
                index = text.find(findInfo)
                if index != -1:
                    list.append([text, urlInfo])

list = []

#スクレイピングの実施
url1 = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxx' #URLをセットする
getInfo = getURL(url1, 'h2')     #URLとタグ情報をセット
if getInfo is None:
    print('データ取得失敗')
else:
    # makeListにてlistを作成
    makeList(getInfo, '位', url1)    #検索ワードを渡す

② 미리 작성해 둔 My 절경 리스트(한 번 스크래핑한 정보로부터 독자적으로 작성)를 DB에 업로드해 둔다

DB에 업로드 예:

import mysql.connector
import csv

#手持ちのデータリストmylistを読み込み、mysql table mylistにupdate/insertする
# MySQL接続
cnt = mysql.connector.connect(
    host='xxxxxxxx',
    db='xxxxxxx',
    user='xxxxxxx',
    password='xxxxxxx',
    charset='utf8'
)

# カーソル取得
db = cnt.cursor(buffered=True)

# mylistを読み込み、dbをselect、データがなければinsert、あればupdate
with open("./mylist.txt", "r", encoding="utf-8_sig") as f:
    reader = csv.reader(f, delimiter='\t')
    for row in reader:
        keyword = row[0]
        name = row[1]
        country = row[2]
        point = row[3]
        # dbデータをkeywordでselect
        sqlselect = "select * from ranks_mylist where keyword='" + keyword + "';"
        db.execute(sqlselect)
        row1 = db.fetchall()
        dbcnt = len(row1)
        # dbにデータがあればupdate
        if dbcnt > 0:
            for rdata in row1:
                id = rdata[0]
                sqlupdate = 'UPDATE ranks_mylist SET name="' + name + '", country="' + country +'", point="' + point +'" where id="' + str(id) + '";'
                db.execute(sqlupdate)
        # dbにデータがなければinsert
        else:
            sqlinsert = 'INSERT INTO ranks_mylist(keyword, name, country, point) VALUES ("' + keyword + '", "' + name + '", "' + country + '",' + point + ')'
            db.execute(sqlinsert)

# カーソル終了
db.close()
# コミット
cnt.commit()
# MySQL切断
cnt.close()

③
My 절경 리스트에 있는 절경과 스크래핑한 정보를 부딪쳐 랭킹을 작성해 DB에 저장.
부딪힐 때는 Google Custom Search API를 사용하여 그 키워드의 검색 결과수도 가미한다.
[참고 URL] 맞춤 검색 API를 사용하여 Google 검색결과 검색
또한, 동시에, 구글의 화상 검색 결과를 1건 취득하고, 그 화상 URL도 동시에 DB에 저장한다.

2. 랭킹 사이트 보기(Django + GoogleMapAPI)

① 작성된 랭킹 테이블을 사용하여 사이트 표시

테이블에의 기입은, javascript 로 document.write 를 사용해 루프 한다. 링크는 Google의 일반 검색과 이미지 검색을 포함합니다.

<caption>人気の世界の絶景</caption>
<thead class="tablehead1">
  <tr>
    <td>1位</td><td>2位</td><td>3位</td><td>4位</td><td>5位</td>
  </tr>
</thead>
<tbody>
  <tr>
   <script type="text/javascript">
     var flg = 1;
     {% for rank in ranks.all %}
       if(flg <= 5){
         document.write("<td><a href='http://www.google.com/search?q={{ rank.name }}' target='_blank'>{{ rank.name }}</a><br>{{ rank.country }}</td>");
       }
       flg ++;
     {% endfor %}
   </script>
 </tr>
 <tr>
   <script type="text/javascript">
     flg = 1;
     {% for rank in ranks.all %}
       if(flg <= 5){
         document.write("<td><a href='http://www.google.com/search?q={{ rank.name }}&tbm=isch' target='_blank'><img src = {{ rank.imageurl }} width='200' height='150'></td>");
       }
       flg ++;
     {% endfor %}
   </script>
 </tr>
</tbody>