spark-submit Submit Application

4207 단어 application
Introduce spark-submit in spark1.0 to submit applicaiton uniformly

./bin/spark-submit \

  --class <main-class>

  --master <master-url> \

  --deploy-mode <deploy-mode> \

  ... # other options

  <application-jar> \

  [application-arguments]

 
--class: the entry point of the application;
--master: the master url of the cluster;
--deploy-mode: the deployment mode of the driver in the cluster;
application-jar: The jar package of the application code, which can be placed on HDFS or on the local file system;
 
Standalone mode case:

spark-submit \

--name SparkSubmit_Demo \

--class com.luogankun.spark.WordCount \

--master spark://hadoop000:7077 \

--executor-memory 1G \

--total-executor-cores 1 \

/home/spark/data/spark.jar \

hdfs://hadoop000:8020/hello.txt

 
The master address of the spark cluster needs to be set in the master;
 
yarn-client mode case:

spark-submit \

--name SparkSubmit_Demo \

--class com.luogankun.spark.WordCount \

--master yarn-client \

--executor-memory 1G \

--total-executor-cores 1 \

/home/spark/data/spark.jar \

hdfs://hadoop000:8020/hello.txt

 
yarn-cluster mode case:

spark-submit \

--name SparkSubmit_Demo \

--class com.luogankun.spark.WordCount \

--master yarn-cluster \

--executor-memory 1G \

--total-executor-cores 1 \

/home/spark/data/spark.jar \

hdfs://hadoop000:8020/hello.txt

 
Note: HADOOP_CONF_DIR needs to be configured for execution on commit yarn
 
The difference between yarn-client and yarn-cluser: differentiated by the location of the Driver
yarn-client:
Client and Driver run together, ApplicationMaster is only used to obtain resources; the results are output on the client console in real time, and log information can be easily seen. This mode is recommended;
After submitting to yarn, yarn first starts ApplicationMaster and Executor, both of which run in Container. Note: Only one executorbackend runs in a container;
yarn-cluser:
The Driver and ApplicationMaster run together, so the running results cannot be displayed on the client console, and the results need to be stored in HDFS or written to the database;
The driver runs on the cluster, and the status of the driver can be accessed through the ui interface.
 
 
 
 
 
 
 
 
 

좋은 웹페이지 즐겨찾기