나 "e-Stat의 boundary 데이터를 GeoPandas로 dissolve하고 GeoPackage에 저장"

18295 단어 pandas 파이썬 GeoPackage geopandas e-Stat

루 ○ 시바가 만면의 미소로 다이브 해 갈 것 같은 타이틀입니다만, 갑니다 MIERUNE Advent Calendar 2020 21일째.

경계 데이터 얻기

all japan의 boundary data는 e-Stat 에서 very easy로 get할 수 있습니다.

boundary data는 이하의 링크로부터 여러가지 search해 get할 수 있습니다.

https://www.e-stat.go.jp/gis/statmap-search?page=1&type=2&aggregateUnitForBoundary=A&toukeiCode=00200521&toukeiYear=2015&serveyId=A002005212015&coordsys=1&

도도부 현 · 시구정촌 당

shp/XML/GML의 3종류

세계 측지계 위도 경도 or 세계 측지계 평면 직각 좌표계

다운로드해보기

마음 속의 대시바가 어쨌든 속삭이고 있었습니다만 스루 해 갑시다.

다음 URL에서 홋카이도 shp 파일을 얻을 수 있습니다.

htps //w w. 네, t. . jp / gi s / s t 마 p-se rch / 이었습니까? dl r ゔぇ y d = 아 002005212015 & 코데 = 01 & 코오 rdSys = 1 & 푸마 t = 샤페 & 드 w ぉ 아 dTy 페 = 5

쿼리 매개 변수 code=01를 변경하여 다운로드하는 도도부현을 변경할 수 있습니다.

도도부현 코드는 여기을 참조.

스크립트 작성

일단 전도도부현의 경계 데이터(shp)를 다운로드하는 스크립트를 작성해 봅시다.

이용하는 외부 패키지는 이하의 2개이므로 다운로드해 갑시다.

tqdm : 진행 막대를 표시하는 멋진 녀석

requests : 쉽게 http로 요청할 수있는 믿음직한 사람

덧붙여서, 이하의 스크립트는 여러가지로 대상 파일의 총 용량을 캐치할 수 없었으므로 tqdm는 바의 표시를 해 주지 않습니다만, 다운로드 속도는 표시해 줍니다.

pip install tqdm
pip install requests

스크립트는 이런 느낌이 들었습니다.

import os
import re
import sys
from concurrent.futures.thread import ThreadPoolExecutor
from pathlib import Path

import requests
from tqdm import tqdm

URL = "https://www.e-stat.go.jp/gis/statmap-search/data?dlserveyId=A002005212015&code={pref_code}&coordSys=1&format=shape&downloadType=5"

str_pref = [str(pref_code).zfill(2) for pref_code in range(1, 48)]
args = [(URL.format(pref_code=p), str(Path("./").resolve())) for p in str_pref]


def main():
    """処理を実行します

    """
    with ThreadPoolExecutor() as executor:
        executor.map(lambda p: file_download(*p), args)


def get_file_name_from_response(url, response):
    """responseのContent-Dispositionからファイル名を取得、できなければURLの末尾をファイル名として返す

    Args:
        url (str): リクエストのURL
        response (Response): responseオブジェクト

    Returns:
        str: ファイル名を返す

    """
    disposition = response.headers["Content-Disposition"]
    try:
        file_name = re.findall(r"filename.+''(.+)", disposition)[0]
    except IndexError:
        print("ファイル名が取得できませんでした")
        file_name = os.path.basename(url)
    return file_name


def file_download(url, dir_path, overwrite=True):
    """URLと保存先ディレクトリを指定してファイルをダウンロード

    Args:
        url (str): ダウンロードリンク
        dir_path (str): 保存するディレクトリのパス文字列
        overwrite (bool): ファイル上書きオプション。Trueなら上書き

    Returns:
        Path: ダウンロードファイルのパスオブジェクト

    Notes:
        すでにファイルが存在していて、overwrite=Falseなら何もせず
        ファイルパスを返す

    """
    res = requests.get(url, stream=True)

    parent_dir = Path(dir_path).parent
    file_name = get_file_name_from_response(url, res)
    download_path = parent_dir / file_name

    if download_path.exists() and not overwrite:
        print("ファイルがすでに存在し、overwrite=Falseなのでダウンロードを中止します。")
        return download_path

    # content-lengthは必ず存在するわけでは無いためチェック
    try:
        file_size = int(res.headers['content-length'])
    except KeyError:
        file_size = None
    progress_bar = tqdm(total=file_size, unit="B", unit_scale=True)

    if res.status_code == 200:
        print(f"{url=}, {res.status_code=}")
        print(f"{file_name}のダウンロードを開始します")
        with download_path.open('wb') as file:
            for chunk in res.iter_content(chunk_size=1024):
                file.write(chunk)
                progress_bar.update(len(chunk))
            progress_bar.close()
        return download_path
    else:
        print(f"{url=}, {res.status_code=}")
        print("正常にリクエストできませんでした。システムを終了します。")
        sys.exit(1)


if __name__ == '__main__':
    main()

스크립트를 실행하면(자), 13행,

args = [(URL.format(pref_code=p), str(Path("./").resolve())) for p in str_pref]

의 Path("./") 로 지정한 디렉토리에 shp 파일이 포함된 zip 파일이 다운로드됩니다.

ThreadPoolExecutor를 이용해 싱글 스레드로 병행 처리를 하고 있기 때문에 다운로드가 폭속입니다.

의외로 깜짝 all japan의 boundary data를 get할 수 있었습니다! s

GeoPandas에서 로드하고 결합

다운로드한 zip 파일을 GeoPandas로 읽어 갑시다.

import glob
from pathlib import Path

import geopandas as gpd
import pandas as pd

zip_path_list = [Path(l) for l in glob.glob("./*.zip")]

town_gdf = pd.concat([
    gpd.read_file("zip://" + str(zipfile))
    for zipfile in zip_path_list
]).pipe(gpd.GeoDataFrame)

town_gdf["AREA_CODE"] = town_gdf['PREF'].str.cat(town_gdf['CITY'])
city_gdf = town_gdf.dissolve(by="AREA_CODE", as_index=False)
pref_gdf = city_gdf.dissolve(by="PREF", as_index=False)

glob로 zip 파일 목록 만들기
→ 리스트내포 표기로 geopandas의 GeoDataFrame 클래스의 리스트를 zip 파일의 리스트로부터 작성
→pandas.concat 메소드로 세로 방향으로 데이터 프레임을 연결
→ 마지막으로 geopandas.GeoDataFrame으로 변환하고 있습니다.

이번 다운로드한 shp 파일은 마을 초메까지 포함한 상세한 폴리곤 데이터이므로, 컬럼을 지정해 GeoDataFrame.dissolve 메소드로 이용 시정촌마다, 도도부현마다의 폴리곤도 작성했습니다.

GeoPackage에 저장

네, 이것만!

pref_gdf.to_file("boundary.gpkg", layer='pref', driver="GPKG")
city_gdf.to_file("boundary.gpkg", layer='city', driver="GPKG")
town_gdf.to_file("boundary.gpkg", layer='town', driver="GPKG")

이제 boundary.gpkg라는 파일 이름으로 pref, city, town의 3 레이어
함께 GeoPackage가 만들어졌습니다!

표시하자

좋은 느낌! ! ! !

매우 간단하게 전국분의 경계 데이터를 이용할 수 있는 e-Stat 최고군요! ! ! ! !

여러분도 점점 이용해 갑시다!

Reference

이 문제에 관하여(나 "e-Stat의 boundary 데이터를 GeoPandas로 dissolve하고 GeoPackage에 저장"), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/nokonoko_1203/items/b726ceebdfa1485f688b

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

Kabu Station® API - PUSH API의 틱 데이터에서 촛대 데이터를 생성합니다.

데이터를 대화식으로 시각화하는 방법

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다