ETL: Node.js로 데이터 변환

ETL은 하나 이상의 소스에서 대상으로 데이터를 추출, 변환 및 로드하는 프로세스입니다. ETL 파이프라인에 대한 일반적인 개요는 문서 ETL pipeline explained을 살펴보십시오.

이것은 세 개의 기사 시리즈 중 두 번째 기사이며 ETL 파이프라인의 변환 단계를 설명하려고 합니다.
  • Extract data
  • 변환(이 기사)
  • 로드

  • ETL 파이프라인에서 데이터 변환

    The second phase in an ETL pipeline is to Transform the extracted data. The data can be completely reformatted in this phase, like renaming fields, adding new fields, filter data out, etc. The transform phase in an ETL pipeline is responsible for transforming the data in the desired format for its destination. In this step you can clean data, standardize values and fields, and aggregate values.

    We are going to continue with the example used in the article ETL: Extract Data with Node.js .

    1. 데이터의 새로운 구조 결정

    The first step in the Transform phase should be to determine what the new data structure should be. In the example we are extracting photo albums which are an array of photo objects. For the transformation, not needed data thumbnailUrl should be removed, a new property name with value Mario (or whatever string value you like) should be added to the photo object. As well, as a timestamp with the current time should be added to the array of photo albums.

    Old photo objects interface:

    interface Photo {
      albumId: number;
      id: number;
      title: string;
      url: string;
      thumbnailUrl: string;
    }
    

    Interface photo object transformed:

    interface Photo {
      albumId: number;
      id: number;
      name: string;
      title: string;
      url: string;
    }
    

    The interface for the photo albums is currently an array with the photo object:

    Array<Photo>
    

    New Interface for photoAlbums:

    interface PhotoAlbums {
      timestamp: Date;
      data: Array<Photo>;
    }
    

    2. 변환 함수 생성

    Create another file transform.js in the project folder, which is going to contain the transform functions.

    touch transform.js
    

    Create a transform function for transforming the photo object. It takes a photo object as an input, returns the needed properties and adds the name property with a string value.

    function transformPhoto(photo) {
      return {
        albumId: photo.albumId,
        id: photo.id,
        name: 'Mario',
        title: photo.title,
        url: photo.url,
      };
    }
    
    module.exports = { transformPhoto };
    

    A second function has to be created for transforming the photoAlbum , a timestamp with current time should be added and the array with photos should be moved into the new property data .

    function addTimeStamp(photoAlbum) {
      return {
        data: photoAlbum,
        timeStamp: new Date(),
      };
    }
    
    module.exports = { transformPhoto, addTimeStamp };
    

    3. ETL 오케스트레이션 기능에 변환 단계 추가

    We are going to use the example with multiple requests for getting photos, since one request is boring. 😀 Now, we have to require both functions in the orchestrateEtlPipeline() in index.js and After the request is done, we map over each photo object in each photoAlbum to apply the transformation with the transformPhoto() function. Then we out the result.

    const { getPhotos } = require('./extract');
    const { addTimeStamp, transformPhoto } = require('./transform');
    
    const orchestrateEtlPipeline = async () => {
      try {
        // EXTRACT
        const allPhotoAlbums = Promise.all([
          getPhotos(1),
          getPhotos(2),
          getPhotos(3),
        ]);
        const [
          photoAlbum1,
          photoAlbum2,
          photoAlbum3,
        ] = await allPhotoAlbums;
    
        // TRANSFORM
        let transformedPhotoAlbum1 = photoAlbum1.map(photo =>
          transformPhoto(photo),
        );
        let transformedPhotoAlbum2 = photoAlbum2.map(photo =>
          transformPhoto(photo),
        );
        let transformedPhotoAlbum3 = photoAlbum3.map(photo =>
          transformPhoto(photo),
        );
    
        console.log(
          transformedPhotoAlbum1[0],
          transformedPhotoAlbum2[0],
          transformedPhotoAlbum3[0],
        ); // log first photo object of each transformed photoAlbum
    
        // TODO - LOAD
      } catch (error) {
        console.error(error);
      }
    };
    
    orchestrateEtlPipeline();
    

    The transformation of the photo object is complete, and the output should just contain the five properties albumId , id , name , title and url , the thumbnailUrl property should be removed. Now we have to transform the photoAlbum and add the timeStamp . We also output the timestamp.

    const { getPhotos } = require('./extract');
    const { addTimeStamp, transformPhoto } = require('./transform');
    
    const orchestrateEtlPipeline = async () => {
      try {
        // EXTRACT
        const allPhotoAlbums = Promise.all([
          getPhotos(1),
          getPhotos(2),
          getPhotos(3),
        ]);
        const [
          photoAlbum1,
          photoAlbum2,
          photoAlbum3,
        ] = await allPhotoAlbums;
    
        // TRANSFORM
        let transformedPhotoAlbum1 = photoAlbum1.map(photo =>
          transformPhoto(photo),
        );
        let transformedPhotoAlbum2 = photoAlbum2.map(photo =>
          transformPhoto(photo),
        );
        let transformedPhotoAlbum3 = photoAlbum3.map(photo =>
          transformPhoto(photo),
        );
    
        console.log(
          transformedPhotoAlbum1[0],
          transformedPhotoAlbum2[0],
          transformedPhotoAlbum3[0],
        ); // log first photo object of each transformed photoAlbum
    
        transformedPhotoAlbum1 = addTimeStamp(transformedPhotoAlbum1);
        transformedPhotoAlbum2 = addTimeStamp(transformedPhotoAlbum2);
        transformedPhotoAlbum3 = addTimeStamp(transformedPhotoAlbum3);
    
        console.log(
          transformedPhotoAlbum1.timeStamp,
          transformedPhotoAlbum2.timeStamp,
          transformedPhotoAlbum3.timeStamp,
        ); // log timestamp
        console.log(transformedPhotoAlbum1);
    
        // TODO - LOAD
      } catch (error) {
        console.error(error);
      }
    };
    
    orchestrateEtlPipeline();
    

    After the last step is finished, we are ready for the next phase of the ETL pipeline Load , which handles loading the transformed data in its destination.

    106

    TL;DR ETL 파이프라인의 두 번째 단계는 데이터를 변환하는 것입니다. 변환 단계의 첫 번째 단계는 새 데이터 구조가 무엇인지 결정하는 것입니다. 두 번째 단계는 데이터를 원하는 형식으로 변환하는 것입니다. 읽어주셔서 감사합니다. 질문이 있으면 댓글 기능을 사용하거나 메시지를 보내주세요.

    If you want to know more about

    참조(그리고 큰 감사):

    Node , Node Tutorials , HeyNode

    좋은 웹페이지 즐겨찾기