ETL: Node.js로 데이터 변환
이것은 세 개의 기사 시리즈 중 두 번째 기사이며 ETL 파이프라인의 변환 단계를 설명하려고 합니다.
ETL 파이프라인에서 데이터 변환
The second phase in an ETL pipeline is to Transform the extracted data. The data can be completely reformatted in this phase, like renaming fields, adding new fields, filter data out, etc. The transform phase in an ETL pipeline is responsible for transforming the data in the desired format for its destination. In this step you can clean data, standardize values and fields, and aggregate values.
We are going to continue with the example used in the article ETL: Extract Data with Node.js .1. 데이터의 새로운 구조 결정
The first step in the Transform phase should be to determine what the new data structure should be. In the example we are extracting photo albums which are an array of photo objects. For the transformation, not needed data thumbnailUrl
should be removed, a new property name
with value Mario
(or whatever string value you like) should be added to the photo object. As well, as a timestamp with the current time should be added to the array of photo albums.
Old photo objects interface:
interface Photo {
albumId: number;
id: number;
title: string;
url: string;
thumbnailUrl: string;
}
Interface photo object transformed:
interface Photo {
albumId: number;
id: number;
name: string;
title: string;
url: string;
}
The interface for the photo albums is currently an array with the photo object:
Array<Photo>
New Interface for photoAlbums:
interface PhotoAlbums {
timestamp: Date;
data: Array<Photo>;
}
2. 변환 함수 생성
Create another file transform.js
in the project folder, which is going to contain the transform functions.
touch transform.js
Create a transform function for transforming the photo object. It takes a photo object as an input, returns the needed properties and adds the name property with a string value.
function transformPhoto(photo) {
return {
albumId: photo.albumId,
id: photo.id,
name: 'Mario',
title: photo.title,
url: photo.url,
};
}
module.exports = { transformPhoto };
A second function has to be created for transforming the photoAlbum
, a timestamp with current time should be added and the array with photos should be moved into the new property data
.
function addTimeStamp(photoAlbum) {
return {
data: photoAlbum,
timeStamp: new Date(),
};
}
module.exports = { transformPhoto, addTimeStamp };
3. ETL 오케스트레이션 기능에 변환 단계 추가
We are going to use the example with multiple requests for getting photos, since one request is boring. 😀 Now, we have to require both functions in the orchestrateEtlPipeline()
in index.js
and After the request is done, we map over each photo object in each photoAlbum to apply the transformation with the transformPhoto()
function. Then we out the result.
const { getPhotos } = require('./extract');
const { addTimeStamp, transformPhoto } = require('./transform');
const orchestrateEtlPipeline = async () => {
try {
// EXTRACT
const allPhotoAlbums = Promise.all([
getPhotos(1),
getPhotos(2),
getPhotos(3),
]);
const [
photoAlbum1,
photoAlbum2,
photoAlbum3,
] = await allPhotoAlbums;
// TRANSFORM
let transformedPhotoAlbum1 = photoAlbum1.map(photo =>
transformPhoto(photo),
);
let transformedPhotoAlbum2 = photoAlbum2.map(photo =>
transformPhoto(photo),
);
let transformedPhotoAlbum3 = photoAlbum3.map(photo =>
transformPhoto(photo),
);
console.log(
transformedPhotoAlbum1[0],
transformedPhotoAlbum2[0],
transformedPhotoAlbum3[0],
); // log first photo object of each transformed photoAlbum
// TODO - LOAD
} catch (error) {
console.error(error);
}
};
orchestrateEtlPipeline();
The transformation of the photo object is complete, and the output should just contain the five properties albumId
, id
, name
, title
and url
, the thumbnailUrl
property should be removed. Now we have to transform the photoAlbum and add the timeStamp
. We also output the timestamp.
const { getPhotos } = require('./extract');
const { addTimeStamp, transformPhoto } = require('./transform');
const orchestrateEtlPipeline = async () => {
try {
// EXTRACT
const allPhotoAlbums = Promise.all([
getPhotos(1),
getPhotos(2),
getPhotos(3),
]);
const [
photoAlbum1,
photoAlbum2,
photoAlbum3,
] = await allPhotoAlbums;
// TRANSFORM
let transformedPhotoAlbum1 = photoAlbum1.map(photo =>
transformPhoto(photo),
);
let transformedPhotoAlbum2 = photoAlbum2.map(photo =>
transformPhoto(photo),
);
let transformedPhotoAlbum3 = photoAlbum3.map(photo =>
transformPhoto(photo),
);
console.log(
transformedPhotoAlbum1[0],
transformedPhotoAlbum2[0],
transformedPhotoAlbum3[0],
); // log first photo object of each transformed photoAlbum
transformedPhotoAlbum1 = addTimeStamp(transformedPhotoAlbum1);
transformedPhotoAlbum2 = addTimeStamp(transformedPhotoAlbum2);
transformedPhotoAlbum3 = addTimeStamp(transformedPhotoAlbum3);
console.log(
transformedPhotoAlbum1.timeStamp,
transformedPhotoAlbum2.timeStamp,
transformedPhotoAlbum3.timeStamp,
); // log timestamp
console.log(transformedPhotoAlbum1);
// TODO - LOAD
} catch (error) {
console.error(error);
}
};
orchestrateEtlPipeline();
After the last step is finished, we are ready for the next phase of the ETL pipeline Load , which handles loading the transformed data in its destination.
106TL;DR ETL 파이프라인의 두 번째 단계는 데이터를 변환하는 것입니다. 변환 단계의 첫 번째 단계는 새 데이터 구조가 무엇인지 결정하는 것입니다. 두 번째 단계는 데이터를 원하는 형식으로 변환하는 것입니다. 읽어주셔서 감사합니다. 질문이 있으면 댓글 기능을 사용하거나 메시지를 보내주세요.
If you want to know more about 봐참조(그리고 큰 감사):
Node , Node Tutorials , HeyNode
Reference
이 문제에 관하여(ETL: Node.js로 데이터 변환), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://dev.to/mariokandut/etl-transform-data-with-node-js-3j52텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)