NodeManager 시작 오류
시스템 환경: ubuntu 14.04 서버cloudera CDH 5.10 은 총 28 개의 노드입니다.단일 노드 메모리 128G.6대의 기계가 24핵이고 22대의 기계가 32핵이다.10대의 기계 디스크 용량은 1.8T, 18대의 기계 디스크 용량은 12.8T이다.resourceManager,namenode는 같은 노드에 있고 데이터 노드와 nodeManager 서비스는 포함되지 않습니다.나머지 28개 노드는 모두 데이터 노드와 nodeManager 서비스가 최근에 집단을 이루어 TB급 데이터 랜덤화 실험을 했고hdfs 각 노드의 데이터 분포가 불균형적이며 10개의 디스크 용량이 1.8T인 기계 디스크가 모두 꽉 차서 이 노드에 yarn의 nodemanager가 충분한 공간이 없어서 9개의 nodemanager가 서비스를 받지 못하고 1개의 nodemanager 서비스가 종료되며 시작할 수 없다.
저용량 디스크 기기 10대의 데이터 노드 역할 정지 권한을 부여하고 데이터를 다른 대용량 기기에 나누어 주고hdfs 데이터를 삭제하는 것을 고려합니다.
권한을 정지하는 동안, 나는 집단에 32핵, 128G 메모리 1.8T 디스크를 추가한 기계에 nodeManager 역할을 분배했다.상술한 일이 순조롭게 완성된 후에yarn을 다시 가동한다.
대부분의 기계가 정상적으로 작동하고 상기 서비스가 종료된nodemanager만 시작할 수 없습니다. 조회 로그는 다음과 같습니다. FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManagerError starting NodeManagerorg.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatchat org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:163)at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:227)at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:544)at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:591)Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatchat org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.openDatabase(NMLeveldbStateStoreService.java:944)at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:931)at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:204)at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)... 5 more
구글 검색 org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatch에서 제공하는 솔루션: 웹 주소:https://community.hortonworks.com/questions/86301/nodemanager-fails-to-start.htmlMaybe a sst file got corrupt can you try to remove the folder of/var/log/hadoop-yarn/nodemanager/recovery-state from failed nodemanagers and check if starts?These files stays in the system even if you decomission the nodes.위의 파일은 노드에 존재하지 않습니다.그러므로 로그를 계속 보려면 다음과 같이 하십시오.
오전 9시 56:15.339분 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManagerregistered UNIX signal handlers for [TERM, HUP, INT] 4월 14, 오전 9시 56:16.426분 INFO org.apache.hadoop.yarn.server.nodemanager.recovery.NMleveldbStateStore ServiceUsing state database at/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state for recovery 4월 14, 오전 9시 56:16.457분 INFO org.apache.hadoop.service.AbstractServiceService org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService failed in state INITED; cause: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatchorg.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatchat org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.openDatabase(NMLeveldbStateStoreService.java:944)at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:931)at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:204)at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:163)at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:227)at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:544)at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:591)4월 14, 오전 9시 56:16.466분 INFO org.apache.hadoop.service.AbstractServiceService NodeManager failed in state INITED; cause: org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatchorg.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatchat org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:163)at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:227)at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:544)at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:591)Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatchat org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.openDatabase(NMLeveldbStateStoreService.java:944)at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:931)at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:204)at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)... 5 more
-
:/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-
state :
-
005615.sst
005638.
log
005640.
log CURRENT LOCK MANIFEST-
004397
-
。 nodemanager 。
-
, nodemanager , nodemanager, , nodemanager , , 。 。
:ubuntu14.04 server cloudera CDH 5.10 28 。 128G。 6 24 ,22 32 。 10 1.8T 18 12.8T。
resourceManager,namenode , datanode nodeManager 。
28 datanode nodeManager
TB ,hdfs ,10 1.8T , yarn nodemanager , 9 nodemanager ,1 nodemanager , 。
10 datanode , , hdfs 。
, 32 ,128G 1.8T , nodeManager 。
, yarn.
, nodemanager , :
FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager
Error starting NodeManager
org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatch
at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:227)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:544)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:591)
Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatch
at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.openDatabase(NMLeveldbStateStoreService.java:944)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:931)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:204)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
... 5 more
google org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatch :
:https://community.hortonworks.com/questions/86301/nodemanager-fails-to-start.html
Maybe a sst file got corrupt can you try to remove the folder of /var/log/hadoop-yarn/nodemanager/recovery-state from failed nodemanagers and check if starts?
These files stays in the system even if you decomission the nodes.
。 :
9 56:15.339 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager
registered UNIX signal handlers for [TERM, HUP, INT]
4 14, 9 56:16.426 INFO org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService
Using state database at /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state for recovery
4 14, 9 56:16.457 INFO org.apache.hadoop.service.AbstractService
Service org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService failed in state INITED; cause: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatch
org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatch
at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.openDatabase(NMLeveldbStateStoreService.java:944)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:931)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:204)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:227)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:544)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:591)
4 14, 9 56:16.466 INFO org.apache.hadoop.service.AbstractService
Service NodeManager failed in state INITED; cause: org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatch
org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatch
at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:172)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:227)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:544)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:591)
Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: Corruption: checksum mismatch
at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.openDatabase(NMLeveldbStateStoreService.java:944)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:931)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:204)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
... 5 more
-
:/var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-
state :
-
005615.sst
005638.
log
005640.
log CURRENT LOCK MANIFEST-
004397
-
。 nodemanager 。
-
, nodemanager , nodemanager, , nodemanager , , 。 。
이 내용에 흥미가 있습니까?
현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:
Hadoop 클러스터의 JobHistory Server 상세 정보역사 서버를 통해 이미 실행된 Mapreduce 작업 기록을 볼 수 있습니다. 이렇게 하면 우리는 해당 기계의 19888 포트에서 역사 서버의 WEB UI 인터페이스를 열 수 있다.이미 실행된 작업 상황을 볼 수 있...
텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.