Spark + HWC로 Hive 테이블을 만들고 자동으로 Metadata를 Atlas에 반영합니다.

5020 단어 스파크atlashiveHWC

Summary



HDP 3.1.x의 경우 Spark + HWC에서 Hive 테이블을 만들고 자동으로 Metadata를 Atlas에 반영하는 방법이 있습니다.


방법:

1) Atlas + Hive의 제휴





2) HWC 준비



전제조건: Hive Warehouse Connector (HWC) and low-latency analytical processing (LLAP)
htps : // / cs. c우우라. 이 m/HDP 도쿠멘 ts/HDP3/HDP-3.1. 이런 c 치온. HTML

3) Spark - Hive 연계



Spark 설정 추가:
Set the values ​​of these properties as follows:
spark.sql.hive.hiveserver2.jdbc.url

In Ambari, copy the value from Services > Hive > Summary > HIVESERVER2 INTERACTIVE JDBC URL.
spark.datasource.hive.warehouse.metastoreUri

Copy the value from hive.metastore.uris. In Hive, at the hive> prompt, enter set hive.metastore.uris and copy the output. For example, thrift://mycluster-1.com:9083.
spark.hadoop.hive.llap.daemon.service.hosts

Copy value from Advanced hive-interactive-site > hive.llap.daemon.service.hosts.
spark.hadoop.hive.zookeeper.quorum

Copy the value from Advanced hive-sitehive.zookeeper.quorum.

예:



4) 동작 확인



htps : // / cs. c우우라. 이 m/HDP 도쿠멘 ts/HDP3/HDP-3.1. HTML
htps : // / cs. c우우라. 이 m / HDP Dokumen ts / HDP 3 / HDP-3. 페라치온 s. HTML
  • Integrating Apache Hive with Apache Spark - Hive Warehouse Connector
  • [centos@zzeng-hdp-1 ~/git/ops/hwx-field-cloud/hdp]$ spark-shell --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.1.4.0-315.jar
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    19/11/30 04:21:12 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
    Spark context Web UI available at http://zzeng-hdp-1.field.hortonworks.com:4041
    Spark context available as 'sc' (master = yarn, app id = application_1575083036450_0018).
    Spark session available as 'spark'.
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_\   version 2.3.2.3.1.4.0-315
          /_/
    
    Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112)
    Type in expressions to have them evaluated.
    Type :help for more information.
    
    scala> import com.hortonworks.hwc.HiveWarehouseSession
    import com.hortonworks.hwc.HiveWarehouseSession
    
    scala> import com.hortonworks.hwc.HiveWarehouseSession._
    import com.hortonworks.hwc.HiveWarehouseSession._
    
    scala> val hive = HiveWarehouseSession.session(spark).build()
    hive: com.hortonworks.spark.sql.hive.llap.HiveWarehouseSessionImpl = com.hortonworks.spark.sql.hive.llap.HiveWarehouseSessionImpl@7b88dd58
    
    scala> hive.createDatabase("zzeng3", false);
    
    scala> hive.setDatabase("zzeng3")
    
    scala> hive.createTable("web_sales").ifNotExists().column("sold_time_sk", "bigint").column("ws_ship_date_sk", "bigint").create()
    
    scala>
    

    Atlas 디스플레이:




    제한사항:
    htps : // / cs. c우우라. 이 m/HDP 도쿠멘 ts/HDP3/HDP-3.1. r_은 d d g g apachi _ s pa rk_이었다. HTML
    1) ORC 테이블만 대응
    2) Spark Thrift Server 미대응

    좋은 웹페이지 즐겨찾기