MapReduce job Shuffle 프로세스의 ERROR

1. 오류 설명

error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#43at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:415)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1550)at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.lang.OutOfMemoryError: Java heap spaceat org.apache.hadoop.io.BoundedByteArrayOutputStream.(BoundedByteArrayOutputStream.java:56)at org.apache.hadoop.io.BoundedByteArrayOutputStream.(BoundedByteArrayOutputStream.java:46)at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.(InMemoryMapOutput.java:63)at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:297)at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:287)at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:414)at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:343)at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:166)

2. 문제 해결

error의 stack 정보를 보면 mapreduce의jobshuffle 과정에서 발생한 OOM입니다. 맵의 출력 데이터는reduce 노드에load가 너무 많아서 OOM이 발생하기 때문에reduce의shuffle를 사용합니다.input는 기본 0.7에서 0.6으로 바뀌었습니다. 이 매개 변수의 변화는 실제 상황에 따라 결정해야 합니다.

  conf.set("mapreduce.reduce.shuffle.input.buffer.percent", "0.6");

3. 관련 매개 변수

mapreduce.reduce.shuffle.input.buffer.percent :

the percentage of the reducer's heap memory to be allocated for the circular buffer to store the intermedite outputs copied from multiple mappers.

mapreduce.reduce.shuffle.memory.limit.percent :

the maximum percentage of the above memory buffer that a single shuffle (output copied from single Map task) should take. The shuffle's size above this size will not be copied to the memory buffer, instead they will be directly written to the disk of the reducer.

mapreduce.reduce.shuffle.merge.percent:

the threshold percentage by where the in-memory merger thread will run to merge the available shuffle contents on the memory buffer into a single file and immediately spills the merged file into the disk.

이 내용에 흥미가 있습니까?

현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:

kafka connect e elasticsearch를 관찰할 수 있습니다.

No menu lateral do dashboard tem a opção de connectors onde ele mostra todos os clusters do kafka connect conectados atu...

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다