ETL 데이터에 스트림을 사용하는 방법은 무엇입니까?
Node.js의 스트림
This is the fifth article of a series about streams in Node.js. This article is about how to perform ETL operations (Extract, Transform, Load) on CSV data using streams.
Streams in Node.js
개요
When working with a flat data, we can just use the fs
module and streams
to process the data (memory-efficient). Instead of reading all the data into memory, we can read it in small chunks with the help of streams to avoid overconsumption of the memory.
In this article we are going to create sample data in a CSV file, extract this data, transform it and load the data.
A C omma- S eparated V alues file is a delimited text file that uses a comma to separate values. Read more here .CSV
데이터를 JSON
또는 더 나은 ndjson 으로 변환할 것입니다. 이는 기본적으로 줄 바꿈으로 구분되고 파일 확장자가 .ndjson
인 JSON 레코드 파일입니다. 확실히, 당신은 스스로에게 묻고 있습니다. 왜 우리는 JSON만 사용하지 않습니까? 주된 이유는 내결함성입니다. 하나의 유효하지 않은 레코드만 JSON에 기록되면 전체 JSON 파일이 손상됩니다. JSON과 ndjson의 주요 차이점은 ndjson 파일에서 파일의 각 행에는 단일 JSON 레코드가 포함되어야 한다는 것입니다. 따라서 ndjson 파일에는 유효한 JSON이 포함되어 있지만 ndjson은 유효한 JSON 문서가 아닙니다. ndjson 형식은 스트리밍 데이터 및 각 레코드가 개별적으로 처리되는 대규모 데이터 세트에서 잘 작동합니다.우리는 가고있다:
1. CSV 데이터 생성
Let's create some sample CSV data, you can use the sample data below, or create your own data with FakerJS CSV로 변환합니다.id,firstName,lastName,email,email2,randomized
100,Jobi,Taam,[email protected],[email protected],Z lsmDLjL
101,Dacia,Elephus,[email protected],[email protected],Za jfPaJof
102,Arlina,Bibi,[email protected],[email protected],zmzlfER
103,Lindie,Torray,[email protected],[email protected],ibVggFEh
104,Modestia,Leonard,[email protected],[email protected]," Tit KCrdh"
105,Karlee,Cornelia,[email protected],[email protected],PkQCUXzq
106,Netty,Travax,[email protected],[email protected],psJKWDBrXm
107,Dede,Romelda,[email protected],[email protected],heUrfT
108,Sissy,Crudden,[email protected],[email protected],cDJxC
109,Sherrie,Sekofski,[email protected],[email protected],dvYHUJ
110,Sarette,Maryanne,[email protected],[email protected],rskGIJNF
111,Selia,Waite,[email protected],[email protected],DOPBe
112,Karly,Tjon,[email protected],[email protected],zzef nCMVL
113,Sherrie,Berriman,[email protected],[email protected],rQqmjw
114,Nadine,Greenwald,[email protected],[email protected],JZsmKafeIf
115,Antonietta,Gino,[email protected],[email protected],IyuCBqwlj
116,June,Dorothy,[email protected],[email protected],vyCTyOjt
117,Belva,Merriott,[email protected],[email protected],MwwiGEjDfR
118,Robinia,Hollingsworth,[email protected],[email protected],wCaIu
119,Dorthy,Pozzy,[email protected],[email protected],fmWOUCIM
120,Barbi,Buffum,[email protected],[email protected],VOZEKSqrZa
121,Priscilla,Hourigan,[email protected],[email protected],XouVGeWwJ
122,Tarra,Hunfredo,[email protected],[email protected],NVzIduxd
123,Madalyn,Westphal,[email protected],[email protected],XIDAOx
124,Ruthe,McAdams,[email protected],[email protected],iwVelLKZH
125,Maryellen,Brotherson,[email protected],[email protected],nfoiVBjjqw
126,Shirlee,Mike,[email protected],[email protected],MnTkBSFDfo
127,Orsola,Giule,[email protected],[email protected],VPrfEYJi
128,Linzy,Bennie,[email protected],[email protected],ZHctp
129,Vanessa,Cohdwell,[email protected],[email protected],RvUcbJihHf
130,Jaclyn,Salvidor,[email protected],[email protected],gbbIxz
131,Mildrid,Pettiford,[email protected],[email protected],snyeV
132,Carol-Jean,Eliathas,[email protected],[email protected],EAAjYHiij
133,Susette,Ogren,[email protected],[email protected]," BhYgr"
134,Farrah,Suanne,[email protected],[email protected],hYZbZIc
135,Cissiee,Idelia,[email protected],[email protected],PNuxbvjx
136,Alleen,Clara,[email protected],[email protected],YkonJWtV
137,Merry,Letsou,[email protected],[email protected],sLfCumcwco
138,Fanny,Clywd,[email protected],[email protected],Go kx
139,Trixi,Pascia,[email protected],[email protected],lipLcqRAHr
140,Sandie,Quinn,[email protected],[email protected],KrGazhI
141,Dania,Wenda,[email protected],[email protected],CXzs kDv
142,Kellen,Vivle,[email protected],[email protected],RrKPYqq
143,Jany,Whittaker,[email protected],[email protected],XAIufn
144,Lusa,Fillbert,[email protected],[email protected],FBFQnPm
145,Farrah,Edee,[email protected],[email protected],TrCwKb
146,Felice,Peonir,[email protected],[email protected],YtVZywf
147,Starla,Juan,[email protected],[email protected],aUTvjVNyw
148,Briney,Elvyn,[email protected],[email protected],tCEvgeUbwF
149,Marcelline,Ricarda,[email protected],[email protected],sDwIlLckbd
150,Mureil,Rubie,[email protected],[email protected],HbcfbKd
151,Nollie,Dudley,[email protected],[email protected],EzjjrNwVUm
152,Yolane,Melony,[email protected],[email protected],wfqSgpgL
153,Brena,Reidar,[email protected],[email protected],iTlvaS
154,Glenda,Sabella,[email protected],[email protected],zzaWxeI
155,Paola,Virgin,[email protected],[email protected],gJO hXTWZl
156,Aryn,Erich,[email protected],[email protected],qUoLwH
157,Tiffie,Borrell,[email protected],[email protected],cIYuVMHwF
158,Anestassia,Daniele,[email protected],[email protected],JsDbQbc
159,Ira,Glovsky,[email protected],[email protected],zKITnYXyhC
160,Sara-Ann,Dannye,[email protected],[email protected],wPClmU
161,Modestia,Zina,[email protected],[email protected],YRwcMqPK
162,Kelly,Poll,[email protected],[email protected],zgklmO
163,Ernesta,Swanhildas,[email protected],[email protected],tWafP
164,Giustina,Erminia,[email protected],[email protected],XgOKKAps
165,Jerry,Kravits,[email protected],[email protected],olzBzS
166,Magdalena,Khorma,[email protected],[email protected],BBKPB
167,Lory,Pacorro,[email protected],[email protected],YmWQB
168,Carilyn,Ethban,[email protected],[email protected],KUXenrJh
169,Tierney,Swigart,[email protected],[email protected],iQCQJ
170,Beverley,Stacy,[email protected],[email protected],NMrS Zpa f
171,Ida,Dex,[email protected],[email protected],hiIgOCxNg
172,Sam,Hieronymus,[email protected],[email protected],dLSkVe
173,Lonnie,Colyer,[email protected],[email protected],ZeDosRy
174,Rori,Ethban,[email protected],[email protected],SXFZQmX
175,Lelah,Niles,[email protected],[email protected],NwxvCXeszl
176,Kathi,Hepsibah,[email protected],[email protected],SOcAOSn
177,Dominga,Cyrie,[email protected],[email protected],IkjDyuqK
178,Pearline,Bakerman,[email protected],[email protected],vHVCkQ
179,Selma,Gillan,[email protected],[email protected],hSZgpBNsw
180,Bernardine,Muriel,[email protected],[email protected],AnSDTDa U
181,Ermengarde,Hollingsworth,[email protected],[email protected],IYQZ Nmv
182,Marguerite,Newell,[email protected],[email protected],kSaD uaHH
183,Albertina,Nisbet,[email protected],[email protected],Y jHyluB
184,Chere,Torray,[email protected],[email protected],loElYdo
185,Vevay,O'Neill,Vevay.O'[email protected],Vevay.O'[email protected],uLZSdatVn
186,Ann-Marie,Gladstone,[email protected],[email protected],fwKlEksI
187,Donnie,Lymann,[email protected],[email protected],deBrqXyyjf
188,Myriam,Posner,[email protected],[email protected],gEMZo
189,Dale,Pitt,[email protected],[email protected],OeMdG
190,Cindelyn,Thornburg,[email protected],[email protected],kvhFmKGoMZ
191,Maisey,Hertzfeld,[email protected],[email protected],OajjJ
192,Corina,Heisel,[email protected],[email protected],luoDJeHo
193,Susette,Marcellus,[email protected],[email protected],AXHtR AyV
194,Lanae,Sekofski,[email protected],[email protected],FgToedU
195,Linet,Beebe,[email protected],[email protected],DYGfRP
196,Emilia,Screens,[email protected],[email protected],LXUcleSs
197,Tierney,Avi,[email protected],[email protected],VegzbHH
198,Pollyanna,Thar,[email protected],[email protected],GjYeEGK
199,Darci,Elephus,[email protected],[email protected],DaQNdN
프로젝트 폴더 생성:
mkdir node-streams-etl
폴더에 csv 파일을 만듭니다.
cd node-streams-etl
touch sample-data.csv
모든 샘플 데이터를 csv 파일에 복사하고 저장합니다. REPL에서 복사+붙여넣기 또는
fs.writeFile
를 사용하거나 터미널에서 -p
플래그와 함께 사용하십시오.2. NPM용 프로젝트 초기화
We are going to use npm packages, hence, we have to initialize the project to get a package.json
npm init -y
Let's add a main file for the code.
touch index.js
sample-date.csv
, and a writable stream, which will be the destination. For now, we just copy the sample data. To connect readStream and writeStream we are going to use the pipeline
method. Error handling is much easier than with the pipe
method. Check out the article How to Connect streams with the pipeline method .const fs = require('fs');
const { pipeline } = require('stream');
const inputStream = fs.createReadStream('data/sample-data.csv');
const outputStream = fs.createWriteStream('data/sample-data.ndjson');
pipeline(inputStream, outputStream, err => {
if (err) {
console.log('Pipeline encountered an error.', err);
} else {
console.log('Pipeline completed successfully.');
}
});
3. CSV 파서 생성
We have to convert the CSV file to JSON, as so often, for every problem, there is a package. In that use-case, there is csvtojson . 이 모듈은 헤더 행을 구문 분석하여 키를 가져온 다음 각 행을 구문 분석하여 JSON 객체를 생성합니다.설치합시다.
npm install csvtojson
성공적으로 설치하면
require
모듈을 pipeline
다음에 inputStream
에 추가할 수 있습니다. 데이터는 CSV file
에서 CSV Parser
로 이동한 다음 Output file
로 이동합니다.우리는
pipeline
방법을 사용할 것입니다. Node.js v.10 이후로 스트림을 연결하고 그들 사이에서 데이터를 파이프하기 위해 선호되는 방법이기 때문입니다. 또한 오류가 발생하면 관련된 스트림이 메모리 누수를 피하기 위해 파괴되기 때문에 완료 또는 실패 시 스트림을 정리하는 데 도움이 됩니다.const fs = require('fs');
const { pipeline } = require('stream');
const csv = require('csvtojson');
const inputStream = fs.createReadStream('data/sample-data.csv');
const outputStream = fs.createWriteStream('data/sample-data.ndjson');
const csvParser = csv();
pipeline(inputStream, csvParser, outputStream, err => {
if (err) {
console.log('Pipeline encountered an error.', err);
} else {
console.log('Pipeline completed successfully.');
}
});
4. 변환 스트림 추가
The data is now emitted to the outputStream
as ndjson
with each data row a valid JSON. Now, we want to transform the data. Since we are using csvtojson
, we could utilize the built-in subscribe
method, which could be used to handle each record after it has been parsed. Though, we want to create a transform stream. Our sample data has the keys id, firstName, lastName, email, email2, randomized
. We want to get rid of the randomized
property in each entry and rename email2
to emailBusiness
.
Transform
streams must implement a transform
method that receives chunk of data as the first argument. It will also receive the encoding type of the data chunk, and a callback function.
const transformStream = new Transform({
transform(chunk, encoding, cb) {
try {
// clone person object
let person = Object.assign({}, JSON.parse(chunk));
// remove randomized property and rename email2 to emailBusiness
person = {
id: person.id,
firstName: person.firstName,
lastName: person.lastName,
emailBusiness: person.email2,
};
cb(null, JSON.stringify(person) + `\n`);
} catch (err) {
cb(err);
}
},
});
Now let's add the transformStream
to the pipeline.
pipeline(
inputStream,
csvParser,
transformStream,
outputStream,
err => {
if (err) {
console.log('Pipeline encountered an error.', err);
} else {
console.log('Pipeline completed successfully.');
}
},
);
5. 실행 및 완료
Run the application with node index.js
and the data in the ndjson
file should look like this.
{"id":"100","firstName":"Jobi","lastName":"Taam","emailBusiness":"[email protected]"}
{"id":"101","firstName":"Dacia","lastName":"Elephus","emailBusiness":"[email protected]"}
{"id":"102","firstName":"Arlina","lastName":"Bibi","emailBusiness":"[email protected]"}
Error handling always has to be done, when working with streams. Since we already did the error handling for all streams, because we are using the pipeline method, the sample project is done.
Congratulations. 🚀✨
TL;DR
- The Newline-delimited JSON (ndjson) format works well with streaming data and large sets of data, where each record is processed individually, and it helps to reduce errors.
- Using pipeline simplifies error handling and stream cleanup, and it makes combining streams more readable and maintainable.
Thanks for reading and if you have any questions , use the comment function or send me a message .
If you want to know more about Node Node Tutorials .참조(그리고 큰 감사):
HeyNode , Node.js - Streams , MDN - Streams , Format and MIME Type , ndjson , csvtojson
Reference
이 문제에 관하여(ETL 데이터에 스트림을 사용하는 방법은 무엇입니까?), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://dev.to/mariokandut/how-to-use-streams-to-etl-data-2169텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)