Exception in thread "main"org.apache.spark.SparkException: Task not serializable

12066 단어 문제 해결 정리
spark 응용 프로그램을 실행하는 동안 다음과 같은 오류가 발생했습니다.
Exception in thread "main" org.apache.spark.SparkException: Task not serializable
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
    at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:2287)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$flatMapValues$1.apply(PairRDDFunctions.scala:769)
	at org.apache.spark.rdd.PairRDDFunctions$$anonfun$flatMapValues$1.apply(PairRDDFunctions.scala:768)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
    at org.apache.spark.rdd.PairRDDFunctions.flatMapValues(PairRDDFunctions.scala:768)
    at com.my.spark.project.userbehaviour.UserClickTrackETL$.main(UserClickTrackETL.scala:72)
    at com.my.spark.project.userbehaviour.UserClickTrackETL.main(UserClickTrackETL.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.NotSerializableException: org.apache.spark.SparkContext
Serialization stack:
    - object not serializable (class: org.apache.spark.SparkContext, value: org.apache.spark.SparkContext@1192b58e)
    - field (class: com.my.spark.project.userbehaviour.UserClickTrackETL$$anonfun$4, name: sc$1, type: class org.apache.spark.SparkContext)
	- object (class com.my.spark.project.userbehaviour.UserClickTrackETL$$anonfun$4, )
    at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
    ... 20 more

이상 중 알림은 org입니다.apache.spark.SparkContext가 정렬되지 않고 잘못된 코드 세션
    val sessionRDD = parsedLogRDD.groupBy((rowLog: TrackerLog) => rowLog.getCookie.toString, partitioner).flatMapValues {
      case iter =>
        // cookie 
        val sortedParseLogs = iter.toArray.sortBy(_.getLogServerTime.toString)
        // cookie , 30 
        val sessionParsedLogsResult = cutSession(sortedParseLogs)
        // hdfs cookie_label.txt
        val cookieLabelRDD = sc.textFile(s"${trackerDataPath}/cookie_label.txt");
        val cLArray = cookieLabelRDD.map(line => {
          val labels = line.split("\\|")
          labels(0) -> labels(1)
        }).collect()

그 중에서 변수sc는 SparkContext의 실례로 드라이버 쪽에서 실행되고,flatMapValues는 Executor 쪽에서 실행되기 때문에 오류를 보고하기 때문에sc의 사용을 flatMapValues에서 옮기면 됩니다.요약할 수 있듯이, 우리는spark 응용 프로그램의 코드 세그먼트가 Driver단에서 실행되고, 어떤 것이 Executor단에서 실행되고, 일반 RDD의api는 Exectuor단에서 실행되는지 알아야 한다.

좋은 웹페이지 즐겨찾기