flink on yarn 모드에 오류가 발생했습니다. 메인 메서드는 오류가 발생했습니다. Yarn job cluster 문제 배열 + 해결

10677 단어 Flink
오류 보고:
flink run -m yarn-cluster -p 2 -yjm 700m -ytm 1024m -c WordCount target/bbb-1.0-SNAPSHOT.jar
전체 오류는 다음과 같습니다.
 The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: Could not deploy Yarn job cluster.
	at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:335)
	at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:205)
	at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:138)
	at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:662)
	at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:210)
	at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:893)
	at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:966)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
	at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
	at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:966)
Caused by: org.apache.flink.client.deployment.ClusterDeploymentException: Could not deploy Yarn job cluster.
	at org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:398)
	at org.apache.flink.client.deployment.executors.AbstractJobClusterExecutor.execute(AbstractJobClusterExecutor.java:70)
	at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1733)
	at org.apache.flink.streaming.api.environment.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:94)
	at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:63)
	at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1620)
	at WordCount.main(WordCount.java:47)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:321)
	... 11 more
Caused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment. 
Diagnostics from YARN: Application application_1591614969089_0002 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1591614969089_0002_000001 exited with  exitCode: 1
Failing this attempt.Diagnostics: [2020-06-08 19:18:12.457]Exception from container-launch.
Container id: container_1591614969089_0002_01_000001
Exit code: 1

[2020-06-08 19:18:12.466]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :

[2020-06-08 19:18:12.467]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :

For more detailed output, check the application tracking page: http://Desktop:8188/applicationhistory/app/application_1591614969089_0002 Then click on links to logs of each attempt.
. Failing the application.
If log aggregation is enabled on your cluster, use this command to further investigate the issue:
yarn logs -applicationId application_1591614969089_0002
	at org.apache.flink.yarn.YarnClusterDescriptor.startAppMaster(YarnClusterDescriptor.java:999)
	at org.apache.flink.yarn.YarnClusterDescriptor.deployInternal(YarnClusterDescriptor.java:488)
	at org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:391)
	... 22 more
2020-06-08 19:18:12,659 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Cancelling deployment from Deployment Failure Hook
2020-06-08 19:18:12,660 INFO  org.apache.hadoop.yarn.client.RMProxy                         - Connecting to ResourceManager at Desktop/192.168.0.103:8032
2020-06-08 19:18:12,661 INFO  org.apache.hadoop.yarn.client.AHSProxy                        - Connecting to Application History server at Desktop/192.168.0.103:10201
2020-06-08 19:18:12,661 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Killing YARN application
2020-06-08 19:18:12,668 INFO  org.apache.hadoop.yarn.client.api.impl.YarnClientImpl         - Killed application application_1591614969089_0002
2020-06-08 19:18:12,769 INFO  org.apache.flink.yarn.YarnClusterDescriptor                   - Deleting files in hdfs://Desktop:9000/user/appleyuchi/.flink/application_1591614969089_0002.

비교적 확인하기 어려운 오류입니다. HADOOP의 로그 서버가 열려 있는지 확인하십시오. 즉, jps에 다음과 같은 오류가 있는지 확인하십시오.
JobHistory Server, 시작 명령:
"$HADOOP_HOME/bin/mapred --daemon start historyserver"
타임라인 서버 열기
yarn timelineserver
위와 같은 조작이 끝나면 yarn 인터페이스의 각 포트가 열릴 것입니다. ####################################
그리고 yarn 인터페이스의log에서 다음과 같은 오류를 보았습니다.
2020-06-08 19:21:02,071 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Shutting YarnJobClusterEntrypoint down with application status FAILED. Diagnostics org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent.
	at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:261)
	at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:215)
	at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:169)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
	at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
	at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:168)
	at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:518)
	at org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint.main(YarnJobClusterEntrypoint.java:119)
Caused by: java.net.BindException: Could not start rest endpoint on any port in port range 8082
	at org.apache.flink.runtime.rest.RestServerEndpoint.start(RestServerEndpoint.java:228)
	at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:165)
	... 9 more
.
2020-06-08 19:21:02,076 INFO  org.apache.flink.runtime.blob.BlobServer                      - Stopped BLOB server at 0.0.0.0:37633
2020-06-08 19:21:02,077 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Stopping Akka RPC service.
2020-06-08 19:21:02,082 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Stopping Akka RPC service.
2020-06-08 19:21:02,087 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator         - Shutting down remote daemon.
2020-06-08 19:21:02,088 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator         - Remote daemon shut down; proceeding with flushing remote transports.
2020-06-08 19:21:02,095 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator         - Shutting down remote daemon.
2020-06-08 19:21:02,095 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator         - Remote daemon shut down; proceeding with flushing remote transports.
2020-06-08 19:21:02,110 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator         - Remoting shut down.
2020-06-08 19:21:02,110 INFO  akka.remote.RemoteActorRefProvider$RemotingTerminator         - Remoting shut down.
2020-06-08 19:21:02,130 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Stopped Akka RPC service.
2020-06-08 19:21:02,131 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Stopped Akka RPC service.
2020-06-08 19:21:02,132 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Could not start cluster entrypoint YarnJobClusterEntrypoint.
org.apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to initialize the cluster entrypoint YarnJobClusterEntrypoint.
	at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:187)
	at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:518)
	at org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint.main(YarnJobClusterEntrypoint.java:119)
Caused by: org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent.
	at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:261)
	at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:215)
	at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:169)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
	at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
	at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:168)
	... 2 more
Caused by: java.net.BindException: Could not start rest endpoint on any port in port range 8082
	at org.apache.flink.runtime.rest.RestServerEndpoint.start(RestServerEndpoint.java:228)
	at org.apache.flink.runtime.entrypoint.component.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:165)
	... 9 more

##############################################################
포트 문제인데 이 포트가 차지가 안 돼서 나도 잠시 멍해졌다.
오류 원인:
이 두 파일 중의 포트는 통일을 유지해야 한다. 나는masters 파일을 수정하는 것을 잊어버려서 상술한 복잡한 오류를 초래했다.
여기에서 기본 8081을 8082로 바꾸는 것은 8081이 spark에 점용되었기 때문에 나는 당시에 flink-conf.yaml를 수정하고 나서 잊어버렸다.
 
최종 솔루션:
flink-conf.yaml:rest.port: 8082
masters:Desktop:8082
그리고 이 두 파일이 집단의 다른 노드에 동기화되는 것을 잊지 마세요.
눈앞의 모든 터미널을 닫고 터미널을 다시 엽니다. 프로필은 새 터미널을 열 때만 적용됩니다.
 
 
 
 

좋은 웹페이지 즐겨찾기