gce에 spark를 설치하는 물론 우분투 환경에서.
google compute engine에 spark 설치
gce에 빠른 배포가 아닌 새 인스턴스를 만들어 spark를 설치합니다.
전제
절차(인스턴스 생성)
1. VM 인스턴스 화면에서 새 인스턴스 버튼을 누릅니다.
2. 머신 타입등을 적당히 선택, 부트 디스크는 물론ubuntu에. 버전은 좋아합니다.
3. 생성한 인스턴스의 gcloud에서 연결 선택
4. 나온 명령 행을 로컬 우분투 term에 붙여 넣습니다.
이제 로컬에서 새로 만든 gce의 우분투 인스턴스에 연결할 수 있습니다.
이후의 순서 등은 gcloud로 접속한 VM측에서 실행한다.
절차(설치)
1. Java8 설치
기본 Ubuntu VM은 Java가 없으므로 설치합니다.
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer
도중에 [ENTER]라든지 [Y]라든지 [OK]라든지 [yes]라든가 선택해 인스톨 한다.
설치가 끝나면 우선 확인해 본다.
junk@instance-2:~$ java -version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
junk@instance-2:~$
OK같다.
2. scala 다운로드
$ cd ~
$ mkdir dl
$ cd dl
$ wget http://www.scala-lang.org/files/archive/scala-2.11.7.tgz
--2015-07-06 16:04:20-- http://www.scala-lang.org/files/archive/scala-2.11.7.tgz
Resolving www.scala-lang.org (www.scala-lang.org)... 128.178.154.159
Connecting to www.scala-lang.org (www.scala-lang.org)|128.178.154.159|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 28460530 (27M) [application/x-gzip]
Saving to: ‘scala-2.11.7.tgz’
scala-2.11.7.tgz 100%[======================================================>] 27.14M 5.57MB/s in 8.3s
2015-07-06 16:04:29 (3.27 MB/s) - ‘scala-2.11.7.tgz’ saved [28460530/28460530]
3. 풀다
tar -xzvf scala-2.11.7.tgz
4. 해동한 scala를 복사해 링크도 만들어 준다.
$ cd /usr/local/
$ sudo cp -r ~/dl/scala-2.11.7 .
$ sudo ln -sv scala-2.11.7/ scala
‘scala’ -> ‘scala-2.11.7/’
5. spark 다운로드
$ cd ~/dl
$ wget http://archive.apache.org/dist/spark/spark-1.4.0/spark-1.4.0-bin-hadoop2.6.tgz
--2015-07-06 16:11:16-- http://archive.apache.org/dist/spark/spark-1.4.0/spark-1.4.0-bin-hadoop2.6.tgz
Resolving archive.apache.org (archive.apache.org)... 192.87.106.229, 140.211.11.131, 2001:610:1:80bc:192:87:106:229
Connecting to archive.apache.org (archive.apache.org)|192.87.106.229|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 250194134 (239M) [application/x-tar]
Saving to: ‘spark-1.4.0-bin-hadoop2.6.tgz’
spark-1.4.0-bin-hadoop2.6.tgz 100%[======================================================>] 238.60M 6.62MB/s in 45s
2015-07-06 16:12:02 (5.32 MB/s) - ‘spark-1.4.0-bin-hadoop2.6.tgz’ saved [250194134/250194134]
6. 풀다
$ tar -xzvf spark-1.4.0-bin-hadoop2.6.tgz
7. 압축을 푼 스파크를 복사하고 링크도 (ry
$ cd /usr/local/
$ sudo cp -r ~/dl/spark-1.4.0-bin-hadoop2.6 .
$ sudo ln -sv spark-1.4.0-bin-hadoop2.6/ spark
‘spark’ -> ‘spark-1.4.0-bin-hadoop2.6/’
8. 경로 설정
$ vi ~/.bashrc
.bashrc의 끝에 다음을 추가
export SCALA_HOME=/usr/local/scala
export SPARK_HOME=/usr/local/spark
export PATH=$SCALA_HOME/bin:$PATH
다시 로드
$ source ~/.bashrc
9. 시작
$ cd $SPARK_HOME
$ ./bin/spark-shell
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/07/06 16:24:33 INFO SecurityManager: Changing view acls to: junk
15/07/06 16:24:33 INFO SecurityManager: Changing modify acls to: junk
15/07/06 16:24:33 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(junk); users with modify permissions: Set(junk)
15/07/06 16:24:33 INFO HttpServer: Starting HTTP Server
15/07/06 16:24:33 INFO Utils: Successfully started service 'HTTP class server' on port 45846.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.4.0
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45)
Type in expressions to have them evaluated.
Type :help for more information.
15/07/06 16:24:38 INFO SparkContext: Running Spark version 1.4.0
15/07/06 16:24:38 INFO SecurityManager: Changing view acls to: junk
15/07/06 16:24:38 INFO SecurityManager: Changing modify acls to: junk
15/07/06 16:24:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(junk); users with modify permissions: Set(junk)
15/07/06 16:24:39 INFO Slf4jLogger: Slf4jLogger started
15/07/06 16:24:39 INFO Remoting: Starting remoting
Mon Jul 06 16:24:42 UTC 2015 Thread[main,5,main] java.io.FileNotFoundException: derby.log (Permission denied)
15/07/06 16:24:43 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
----------------------------------------------------------------
Loaded from file:/usr/local/spark-1.4.0-bin-hadoop2.6/lib/spark-assembly-1.4.0-hadoop2.6.0.jar
java.vendor=Oracle Corporation
java.runtime.version=1.8.0_45-b14
user.dir=/usr/local/spark-1.4.0-bin-hadoop2.6
os.name=Linux
os.arch=amd64
os.version=3.19.0-21-generic
derby.system.home=null
Database Class Loader started - derby.database.classpath=''
15/07/06 16:24:45 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
15/07/06 16:24:45 INFO MetaStoreDirectSql: MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5. Encountered: "@" (64), after : "".
15/07/06 16:24:46 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
15/07/06 16:24:46 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
15/07/06 16:24:47 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
15/07/06 16:24:47 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
15/07/06 16:24:47 INFO ObjectStore: Initialized ObjectStore
15/07/06 16:24:48 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 0.13.1aa
15/07/06 16:24:48 INFO HiveMetaStore: Added admin role in metastore
15/07/06 16:24:48 INFO HiveMetaStore: Added public role in metastore
15/07/06 16:24:48 INFO HiveMetaStore: No user is added in admin role, since config is empty
15/07/06 16:24:48 INFO SessionState: No Tez session required at this point. hive.execution.engine=mr.
15/07/06 16:24:48 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
scala>
마지막 로그는 길어서 중간에 약어
여기까지 10분 정도로 할 수 있는 것 같습니다.
Reference
이 문제에 관하여(gce에 spark를 설치하는 물론 우분투 환경에서.), 우리는 이곳에서 더 많은 자료를 발견하고 링크를 클릭하여 보았다 https://qiita.com/junk1400/items/782728cee32817cc0c17텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.
우수한 개발자 콘텐츠 발견에 전념 (Collection and Share based on the CC Protocol.)