AWS S3 대용량 엑셀 파일 업로드

2922 단어

멀티파트 업로드에서 대용량 엑셀 파일을 s3에 업로드하고 업로드된 멀티파트 파일에서 데이터를 읽을 수 있는 사람이 있습니까?

현재 numpy 배열을 사용하여 파일 바이트를 팬더 데이터 프레임으로 변환하여 파일 바이트를 분할하여 멀티파트로 파일을 업로드하는 것과 같습니다. 이것이 올바른 방법입니까 아니면 대안이 있습니까? 멀티파트 업로드를 수행하기 위해 바이트를 pandas로 변환하는 데 많은 시간이 걸리기 때문입니다.

다음은 내 코드입니다.
def multipart_upload(self, filename: str, user_settings: UserSettings, 모델: RootModel, 위치: str, content_buffer: Any,
content_df: pd.DataFrame, 콘텐츠_유형: str):

    s3_client = get_boto3_s3_client()
    chunksize = 5 * 1024 * 1024

    part_number = 0
    chunk: pd.DataFrame
    parts_info = []

    key_name = S3UtilsBase().prepare_s3_key_path(filename=filename,
                                                 location=location,
                                                 model=model,
                                                 user_settings=user_settings)

    multipart_upload_resp = s3_client.create_multipart_upload(
        Bucket=settings.AWS_S3_BUCKET, Key=key_name)

    for chunk in np.array_split(content_df, len(content_buffer.getvalue()) // chunksize):
        buffer = io.BytesIO()
        part_number = part_number + 1
        excel_file_types = S3UtilsBase().get_excel_types()
        if content_type == 'csv':
            chunk.to_csv(buffer)
        elif content_type in excel_file_types:
            chunk.to_excel(buffer)

        chunk_resp = s3_client.upload_part(Bucket=settings.AWS_S3_BUCKET,
                                           Key=multipart_upload_resp['Key'],
                                           PartNumber=part_number,
                                           UploadId=multipart_upload_resp['UploadId'],
                                           Body=buffer.getvalue())

        parts_info.append({
            'PartNumber': part_number,
            'ETag': chunk_resp['ETag']
        })

    parts_info = sorted(parts_info, key=lambda x: x["PartNumber"])
    cmp_multipart_upload_resp = s3_client.complete_multipart_upload(Bucket=settings.AWS_S3_BUCKET,
                                                                    Key=multipart_upload_resp['Key'],
                                                                    UploadId=multipart_upload_resp['UploadId'],
                                                                    MultipartUpload={"Parts": parts_info})

    paths = cmp_multipart_upload_resp["Key"].split("/")
    separator = "/"
    prefix = separator.join(paths[:-1])
    return AttachmentResponseFromS3(
        Prefix=prefix,
        Error=False,
        Message='',
        Version=cmp_multipart_upload_resp["VersionId"],
        Host=f'{AWS_S3}://{cmp_multipart_upload_resp["Bucket"]}',
        FileName=paths[-1],
        Parts=part_number
    )

Reference

이 문제에 관하여(AWS S3 대용량 엑셀 파일 업로드), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://dev.to/poojahoney/aws-s3-upload-large-excel-file-1enj

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다