MapReduce job Shuffle 프로세스의 ERROR

2540 단어 elasticsearchhadoop

1. 오류 설명


error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#43at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:415)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1550)at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.lang.OutOfMemoryError: Java heap spaceat org.apache.hadoop.io.BoundedByteArrayOutputStream.(BoundedByteArrayOutputStream.java:56)at org.apache.hadoop.io.BoundedByteArrayOutputStream.(BoundedByteArrayOutputStream.java:46)at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.(InMemoryMapOutput.java:63)at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:297)at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:287)at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:414)at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:343)at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:166)

2. 문제 해결


           
error의 stack 정보를 보면 mapreduce의jobshuffle 과정에서 발생한 OOM입니다. 맵의 출력 데이터는reduce 노드에load가 너무 많아서 OOM이 발생하기 때문에reduce의shuffle를 사용합니다.input는 기본 0.7에서 0.6으로 바뀌었습니다. 이 매개 변수의 변화는 실제 상황에 따라 결정해야 합니다.
  conf.set("mapreduce.reduce.shuffle.input.buffer.percent", "0.6");

3. 관련 매개 변수

  • mapreduce.reduce.shuffle.input.buffer.percent :

  • the percentage of the reducer's heap memory to be allocated for the circular buffer to store the intermedite outputs copied from multiple mappers.
  • mapreduce.reduce.shuffle.memory.limit.percent :

  • the maximum percentage of the above memory buffer that a single shuffle (output copied from single Map task) should take. The shuffle's size above this size will not be copied to the memory buffer, instead they will be directly written to the disk of the reducer.
  • mapreduce.reduce.shuffle.merge.percent:

  • the threshold percentage by where the in-memory merger thread will run to merge the available shuffle contents on the memory buffer into a single file and immediately spills the merged file into the disk.

    좋은 웹페이지 즐겨찾기