gce에 spark를 설치하는 물론 우분투 환경에서.

google compute engine에 spark 설치



gce에 빠른 배포가 아닌 새 인스턴스를 만들어 spark를 설치합니다.

전제


  • 로컬 환경은 우분투
  • gcloud 설정됨

  • 절차(인스턴스 생성)



    1. VM 인스턴스 화면에서 새 인스턴스 버튼을 누릅니다.


    2. 머신 타입등을 적당히 선택, 부트 디스크는 물론ubuntu에. 버전은 좋아합니다.


    3. 생성한 인스턴스의 gcloud에서 연결 선택


    4. 나온 명령 행을 로컬 우분투 term에 붙여 넣습니다.


    이제 로컬에서 새로 만든 gce의 우분투 인스턴스에 연결할 수 있습니다.
    이후의 순서 등은 gcloud로 접속한 VM측에서 실행한다.

    절차(설치)



    1. Java8 설치
    기본 Ubuntu VM은 Java가 없으므로 설치합니다.
    $ sudo add-apt-repository ppa:webupd8team/java
    $ sudo apt-get update
    $ sudo apt-get install oracle-java8-installer
    

    도중에 [ENTER]라든지 [Y]라든지 [OK]라든지 [yes]라든가 선택해 인스톨 한다.
    설치가 끝나면 우선 확인해 본다.
    junk@instance-2:~$ java -version
    java version "1.8.0_45"
    Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
    Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
    junk@instance-2:~$ 
    

    OK같다.

    2. scala 다운로드
    $ cd ~
    $ mkdir dl
    $ cd dl
    $ wget http://www.scala-lang.org/files/archive/scala-2.11.7.tgz
    
    --2015-07-06 16:04:20--  http://www.scala-lang.org/files/archive/scala-2.11.7.tgz
    Resolving www.scala-lang.org (www.scala-lang.org)... 128.178.154.159
    Connecting to www.scala-lang.org (www.scala-lang.org)|128.178.154.159|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 28460530 (27M) [application/x-gzip]
    Saving to: ‘scala-2.11.7.tgz’
    
    scala-2.11.7.tgz                100%[======================================================>]  27.14M  5.57MB/s   in 8.3s   
    
    2015-07-06 16:04:29 (3.27 MB/s) - ‘scala-2.11.7.tgz’ saved [28460530/28460530]
    

    3. 풀다
    tar -xzvf scala-2.11.7.tgz

    4. 해동한 scala를 복사해 링크도 만들어 준다.
    $ cd /usr/local/
    $ sudo cp -r ~/dl/scala-2.11.7 .
    $ sudo ln -sv scala-2.11.7/ scala
    ‘scala’ -> ‘scala-2.11.7/’
    

    5. spark 다운로드
    $ cd ~/dl
    $ wget http://archive.apache.org/dist/spark/spark-1.4.0/spark-1.4.0-bin-hadoop2.6.tgz
    
    --2015-07-06 16:11:16--  http://archive.apache.org/dist/spark/spark-1.4.0/spark-1.4.0-bin-hadoop2.6.tgz
    Resolving archive.apache.org (archive.apache.org)... 192.87.106.229, 140.211.11.131, 2001:610:1:80bc:192:87:106:229
    Connecting to archive.apache.org (archive.apache.org)|192.87.106.229|:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 250194134 (239M) [application/x-tar]
    Saving to: ‘spark-1.4.0-bin-hadoop2.6.tgz’
    
    spark-1.4.0-bin-hadoop2.6.tgz   100%[======================================================>] 238.60M  6.62MB/s   in 45s    
    
    2015-07-06 16:12:02 (5.32 MB/s) - ‘spark-1.4.0-bin-hadoop2.6.tgz’ saved [250194134/250194134]
    
    

    6. 풀다
    $ tar -xzvf spark-1.4.0-bin-hadoop2.6.tgz

    7. 압축을 푼 스파크를 복사하고 링크도 (ry
    $ cd /usr/local/
    $ sudo cp -r ~/dl/spark-1.4.0-bin-hadoop2.6 .
    $ sudo ln -sv spark-1.4.0-bin-hadoop2.6/ spark
    ‘spark’ -> ‘spark-1.4.0-bin-hadoop2.6/’
    

    8. 경로 설정
    $ vi ~/.bashrc  
    

    .bashrc의 끝에 다음을 추가
    export SCALA_HOME=/usr/local/scala
    export SPARK_HOME=/usr/local/spark
    export PATH=$SCALA_HOME/bin:$PATH
    

    다시 로드
    $ source ~/.bashrc
    

    9. 시작
    $ cd $SPARK_HOME
    $ ./bin/spark-shell
    
    log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    15/07/06 16:24:33 INFO SecurityManager: Changing view acls to: junk
    15/07/06 16:24:33 INFO SecurityManager: Changing modify acls to: junk
    15/07/06 16:24:33 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(junk); users with modify permissions: Set(junk)
    15/07/06 16:24:33 INFO HttpServer: Starting HTTP Server
    15/07/06 16:24:33 INFO Utils: Successfully started service 'HTTP class server' on port 45846.
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_\   version 1.4.0
          /_/
    
    Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45)
    Type in expressions to have them evaluated.
    Type :help for more information.
    15/07/06 16:24:38 INFO SparkContext: Running Spark version 1.4.0
    15/07/06 16:24:38 INFO SecurityManager: Changing view acls to: junk
    15/07/06 16:24:38 INFO SecurityManager: Changing modify acls to: junk
    15/07/06 16:24:38 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(junk); users with modify permissions: Set(junk)
    15/07/06 16:24:39 INFO Slf4jLogger: Slf4jLogger started
    15/07/06 16:24:39 INFO Remoting: Starting remoting
    Mon Jul 06 16:24:42 UTC 2015 Thread[main,5,main] java.io.FileNotFoundException: derby.log (Permission denied)
    15/07/06 16:24:43 WARN Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
    ----------------------------------------------------------------
    Loaded from file:/usr/local/spark-1.4.0-bin-hadoop2.6/lib/spark-assembly-1.4.0-hadoop2.6.0.jar
    java.vendor=Oracle Corporation
    java.runtime.version=1.8.0_45-b14
    user.dir=/usr/local/spark-1.4.0-bin-hadoop2.6
    os.name=Linux
    os.arch=amd64
    os.version=3.19.0-21-generic
    derby.system.home=null
    Database Class Loader started - derby.database.classpath=''
    15/07/06 16:24:45 INFO ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
    15/07/06 16:24:45 INFO MetaStoreDirectSql: MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5.  Encountered: "@" (64), after : "".
    15/07/06 16:24:46 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
    15/07/06 16:24:46 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
    15/07/06 16:24:47 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
    15/07/06 16:24:47 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
    15/07/06 16:24:47 INFO ObjectStore: Initialized ObjectStore
    15/07/06 16:24:48 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 0.13.1aa
    15/07/06 16:24:48 INFO HiveMetaStore: Added admin role in metastore
    15/07/06 16:24:48 INFO HiveMetaStore: Added public role in metastore
    15/07/06 16:24:48 INFO HiveMetaStore: No user is added in admin role, since config is empty
    15/07/06 16:24:48 INFO SessionState: No Tez session required at this point. hive.execution.engine=mr.
    15/07/06 16:24:48 INFO SparkILoop: Created sql context (with Hive support)..
    SQL context available as sqlContext.
    
    scala> 
    

    마지막 로그는 길어서 중간에 약어

    여기까지 10분 정도로 할 수 있는 것 같습니다.

    좋은 웹페이지 즐겨찾기