접착제 – Athena 사용자 지정 출력 고정 파일 수

3061 단어 glue aws

상황:

파티션 절만 사용하면 S3 버킷에 1MB 미만인 파일이 너무 많아 쿼리 속도에 영향을 미치고 더 큰 파일로 만들고 싶습니다.

해결책:

해결 방법 1: Athena "bucketing" 방법을 사용하여 출력 파일 수를 사용자 지정합니다.

자세한 내용은 이 AWS 블로그를 참조하십시오.
How can I set the number or size of files when I run a CTAS query in Athena?

그러나 버킷팅을 사용하는 경우 한 가지 단점이 있습니다. 버킷 테이블은 INSERT INTO 쿼리를 지원하지 않습니다. 여기에 솔루션 2가 있습니다.

해결 방법 2: Glue 재분할 사용

컨텍스트는 동일하지만 이제 INSERT INTO 쿼리를 사용하고 싶습니다.

절차는 이 AWS 블로그를 참조할 수 있습니다.
Build a Data Lake Foundation with AWS Glue and Amazon S3

"13. 작업 보기"단계에서 작업에 다음 코드를 추가합니다.

datasource_df = dropnullfields3.repartition(<number of output file you want here>)

줄 바로 뒤에:

dropnullfields3 = DropNullFields.apply(frame = resolvechoice2, transformation_ctx = "dropnullfields3")

코드를 편집합니다.

datasink4 = glueContext.write_dynamic_frame.from_options(frame = dropnullfields3, connection_type = "s3", connection_options = {"path": "<your_s3_path>"}, format = "parquet", transformation_ctx = "datasink4")

에게:

datasink4 = glueContext.write_dynamic_frame.from_options(frame = datasource_df, connection_type = "s3", connection_options = {"path": "<your_s3_path>"}, format = "parquet", transformation_ctx = "datasink4")

Glue 재파티션에 대해 더 알고 싶다면:

Try querying with Athena
Create table:

CREATE EXTERNAL TABLE IF NOT EXISTS demo_query (
  dispatching_base_num string,
  pickup_date string,
  locationid bigint)
STORED AS PARQUET
LOCATION 's3://athena-examples/parquet/'
tblproperties ("parquet.compress"="SNAPPY");

Try to insert:

insert into demo_query ("dispatching_base_num", "pickup_date", "locationid") values ('aa23dtgt', '2020-12-03', 1234);

The insert query now should work. Success!

Reference

이 문제에 관하여(접착제 – Athena 사용자 지정 출력 고정 파일 수), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://dev.to/yentrinh/glue-athena-custom-output-fixed-number-of-files-2alb

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다