flink on yarn 모드에 오류가 발생했습니다. 메인 메서드는 오류가 발생했습니다. Yarn job cluster 문제 배열 + 해결
10677 단어 Flink
flink run -m yarn-cluster -p 2 -yjm 700m -ytm 1024m -c WordCount target/bbb-1.0-SNAPSHOT.jar
전체 오류는 다음과 같습니다.
The program finished with the following exception:
org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: Could not deploy Yarn job cluster.
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:335)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:205)
at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:138)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:662)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:210)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:893)
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:966)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:966)
Caused by: org.apache.flink.client.deployment.ClusterDeploymentException: Could not deploy Yarn job cluster.
at org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:398)
at org.apache.flink.client.deployment.executors.AbstractJobClusterExecutor.execute(AbstractJobClusterExecutor.java:70)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1733)
at org.apache.flink.streaming.api.environment.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:94)
at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:63)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1620)
at WordCount.main(WordCount.java:47)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:321)
... 11 more
Caused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment.
Diagnostics from YARN: Application application_1591614969089_0002 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1591614969089_0002_000001 exited with exitCode: 1
Failing this attempt.Diagnostics: [2020-06-08 19:18:12.457]Exception from container-launch.
Container id: container_1591614969089_0002_01_000001
Exit code: 1
[2020-06-08 19:18:12.466]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
[2020-06-08 19:18:12.467]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
For more detailed output, check the application tracking page: http://Desktop:8188/applicationhistory/app/application_1591614969089_0002 Then click on links to logs of each attempt.
. Failing the application.
If log aggregation is enabled on your cluster, use this command to further investigate the issue:
yarn logs -applicationId application_1591614969089_0002
at org.apache.flink.yarn.YarnClusterDescriptor.startAppMaster(YarnClusterDescriptor.java:999)
at org.apache.flink.yarn.YarnClusterDescriptor.deployInternal(YarnClusterDescriptor.java:488)
at org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:391)
... 22 more
2020-06-08 19:18:12,659 INFO org.apache.flink.yarn.YarnClusterDescriptor - Cancelling deployment from Deployment Failure Hook
2020-06-08 19:18:12,660 INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at Desktop/192.168.0.103:8032
2020-06-08 19:18:12,661 INFO org.apache.hadoop.yarn.client.AHSProxy - Connecting to Application History server at Desktop/192.168.0.103:10201
2020-06-08 19:18:12,661 INFO org.apache.flink.yarn.YarnClusterDescriptor - Killing YARN application
2020-06-08 19:18:12,668 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Killed application application_1591614969089_0002
2020-06-08 19:18:12,769 INFO org.apache.flink.yarn.YarnClusterDescriptor - Deleting files in hdfs://Desktop:9000/user/appleyuchi/.flink/application_1591614969089_0002.
비교적 확인하기 어려운 오류입니다. HADOOP의 로그 서버가 열려 있는지 확인하십시오. 즉, jps에 다음과 같은 오류가 있는지 확인하십시오.
JobHistory Server, 시작 명령:
"$HADOOP_HOME/bin/mapred --daemon start historyserver"
타임라인 서버 열기
yarn timelineserver
위와 같은 조작이 끝나면 yarn 인터페이스의 각 포트가 열릴 것입니다. ####################################
그리고 yarn 인터페이스의log에서 다음과 같은 오류를 보았습니다.
2020-06-08 19:21:02,071 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Shutting YarnJobClusterEntrypoint down with application status FAILED. Diagnostics org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent.
at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:261)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:215)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:169)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:168)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:518)
at org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint.main(YarnJobClusterEntrypoint.java:119)
Caused by: java.net.BindException: Could not start rest endpoint on any port in port range 8082
at org.apache.flink.runtime.rest.RestServerEndpoint.start(RestServerEndpoint.java:228)
at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:165)
... 9 more
.
2020-06-08 19:21:02,076 INFO org.apache.flink.runtime.blob.BlobServer - Stopped BLOB server at 0.0.0.0:37633
2020-06-08 19:21:02,077 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopping Akka RPC service.
2020-06-08 19:21:02,082 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopping Akka RPC service.
2020-06-08 19:21:02,087 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down remote daemon.
2020-06-08 19:21:02,088 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut down; proceeding with flushing remote transports.
2020-06-08 19:21:02,095 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down remote daemon.
2020-06-08 19:21:02,095 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut down; proceeding with flushing remote transports.
2020-06-08 19:21:02,110 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down.
2020-06-08 19:21:02,110 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down.
2020-06-08 19:21:02,130 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopped Akka RPC service.
2020-06-08 19:21:02,131 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopped Akka RPC service.
2020-06-08 19:21:02,132 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Could not start cluster entrypoint YarnJobClusterEntrypoint.
org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to initialize the cluster entrypoint YarnJobClusterEntrypoint.
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:187)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:518)
at org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint.main(YarnJobClusterEntrypoint.java:119)
Caused by: org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent.
at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:261)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:215)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:169)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:168)
... 2 more
Caused by: java.net.BindException: Could not start rest endpoint on any port in port range 8082
at org.apache.flink.runtime.rest.RestServerEndpoint.start(RestServerEndpoint.java:228)
at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:165)
... 9 more
##############################################################
포트 문제인데 이 포트가 차지가 안 돼서 나도 잠시 멍해졌다.
오류 원인:
이 두 파일 중의 포트는 통일을 유지해야 한다. 나는masters 파일을 수정하는 것을 잊어버려서 상술한 복잡한 오류를 초래했다.
여기에서 기본 8081을 8082로 바꾸는 것은 8081이 spark에 점용되었기 때문에 나는 당시에 flink-conf.yaml를 수정하고 나서 잊어버렸다.
최종 솔루션:
flink-conf.yaml:rest.port: 8082
masters:Desktop:8082
그리고 이 두 파일이 집단의 다른 노드에 동기화되는 것을 잊지 마세요.
눈앞의 모든 터미널을 닫고 터미널을 다시 엽니다. 프로필은 새 터미널을 열 때만 적용됩니다.
이 내용에 흥미가 있습니까?
현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:
Flink On YARN 고가용 클러스터 모드 구축(flink-1.10.0-bin-scala_2.11.tgz)다운로드 주소:https://flink.apache.org/downloads.html 다운로드한 설치 패키지를 서버에 업로드하고 지정한 디렉터리에 압축을 풀십시오. 명령은 다음과 같습니다. 파일 끝에 다음과 같은 내...
텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.