NodeManager 시작 오류

19126 단어 HadoopYarn
Nodeemanager 예외 org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatch
시스템 환경: ubuntu 14.04 서버cloudera CDH 5.10 은 총 28 개의 노드입니다.단일 노드 메모리 128G.6대의 기계가 24핵이고 22대의 기계가 32핵이다.10대의 기계 디스크 용량은 1.8T, 18대의 기계 디스크 용량은 12.8T이다.resourceManager,namenode는 같은 노드에 있고 데이터 노드와 nodeManager 서비스는 포함되지 않습니다.나머지 28개 노드는 모두 데이터 노드와 nodeManager 서비스가 최근에 집단을 이루어 TB급 데이터 랜덤화 실험을 했고hdfs 각 노드의 데이터 분포가 불균형적이며 10개의 디스크 용량이 1.8T인 기계 디스크가 모두 꽉 차서 이 노드에 yarn의 nodemanager가 충분한 공간이 없어서 9개의 nodemanager가 서비스를 받지 못하고 1개의 nodemanager 서비스가 종료되며 시작할 수 없다.
저용량 디스크 기기 10대의 데이터 노드 역할 정지 권한을 부여하고 데이터를 다른 대용량 기기에 나누어 주고hdfs 데이터를 삭제하는 것을 고려합니다.
권한을 정지하는 동안, 나는 집단에 32핵, 128G 메모리 1.8T 디스크를 추가한 기계에 nodeManager 역할을 분배했다.상술한 일이 순조롭게 완성된 후에yarn을 다시 가동한다.
대부분의 기계가 정상적으로 작동하고 상기 서비스가 종료된nodemanager만 시작할 수 없습니다. 조회 로그는 다음과 같습니다. FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManagerError starting NodeManagerorg.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatchat org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:163)at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:227)at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:544)at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:591)Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatchat org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.openDatabase(NMLeveldbStateStoreService.java:944)at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:931)at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:204)at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)... 5 more
구글 검색 org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatch에서 제공하는 솔루션: 웹 주소:https://community.hortonworks.com/questions/86301/nodemanager-fails-to-start.htmlMaybe a sst file got corrupt can you try to remove the folder of/var/log/hadoop-yarn/nodemanager/recovery-state from failed nodemanagers and check if starts?These files stays in the system even if you decomission the nodes.위의 파일은 노드에 존재하지 않습니다.그러므로 로그를 계속 보려면 다음과 같이 하십시오.
오전 9시 56:15.339분 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManagerregistered UNIX signal handlers for [TERM, HUP, INT] 4월 14, 오전 9시 56:16.426분 INFO org.apache.hadoop.yarn.server.nodemanager.recovery.NMleveldbStateStore ServiceUsing state database at/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state for recovery 4월 14, 오전 9시 56:16.457분 INFO org.apache.hadoop.service.AbstractServiceService org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService failed in state INITED; cause: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatchorg.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatchat org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.openDatabase(NMLeveldbStateStoreService.java:944)at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:931)at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:204)at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:163)at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:227)at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:544)at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:591)4월 14, 오전 9시 56:16.466분 INFO org.apache.hadoop.service.AbstractServiceService NodeManager failed in state INITED; cause: org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatchorg.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatchat org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:163)at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:227)at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:544)at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:591)Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatchat org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.openDatabase(NMLeveldbStateStoreService.java:944)at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:931)at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:204)at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)... 5 more
  1. :/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm- state
  2. 005615.sst 005638. log 005640. log CURRENT LOCK MANIFEST- 004397
  3. 。 nodemanager 。
  4. , nodemanager , nodemanager, , nodemanager , , 。 。


:ubuntu14.04 server cloudera CDH 5.10 28 。 128G。 6 24 ,22 32 。 10 1.8T 18 12.8T。
resourceManager,namenode , datanode nodeManager 。
28 datanode nodeManager
TB ,hdfs ,10 1.8T , yarn nodemanager , 9 nodemanager ,1 nodemanager , 。

10 datanode , , hdfs 。

, 32 ,128G 1.8T , nodeManager 。
, yarn.

, nodemanager , :
FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager
Error starting NodeManager
org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatch
at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:227)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:544)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:591)
Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatch
at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.openDatabase(NMLeveldbStateStoreService.java:944)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:931)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:204)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
... 5 more

google org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatch :
:https://community.hortonworks.com/questions/86301/nodemanager-fails-to-start.html
Maybe a sst file got corrupt can you try to remove the folder of /var/log/hadoop-yarn/nodemanager/recovery-state from failed nodemanagers and check if starts?
These files stays in the system even if you decomission the nodes.
。 :

9 56:15.339 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager
registered UNIX signal handlers for [TERM, HUP, INT]
4 14, 9 56:16.426 INFO org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService
Using state database at /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state for recovery
4 14, 9 56:16.457 INFO org.apache.hadoop.service.AbstractService
Service org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService failed in state INITED; cause: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatch
org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatch
at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.openDatabase(NMLeveldbStateStoreService.java:944)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:931)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:204)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:227)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:544)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:591)
4 14, 9 56:16.466 INFO org.apache.hadoop.service.AbstractService
Service NodeManager failed in state INITED; cause: org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatch
org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatch
at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:227)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:544)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:591)
Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatch
at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.openDatabase(NMLeveldbStateStoreService.java:944)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:931)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:204)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
... 5 more

  1. :/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm- state
  2. 005615.sst 005638. log 005640. log CURRENT LOCK MANIFEST- 004397
  3. 。 nodemanager 。
  4. , nodemanager , nodemanager, , nodemanager , , 。 。
  Yarn

좋은 웹페이지 즐겨찾기