Hive DML 구문

18291 단어 hive
  • 테이블에 파일 로드
  • LOAD DATA [LOCAL] INPATH ‘filepath’ [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 …)]

  • hive> load data local inpath "/home/hadoop/data/deptn.sql" overwrite into table dept;
    Loading data to table default.dept
    Table default.dept stats: [numFiles=1, numRows=0, totalSize=80, rawDataSize=0]
    OK
    Time taken: 2.401 seconds
    
  • 조회 결과 파일 시스템 쓰기(데이터 내보내기) INSERT OVERWRITE [LOCAL] DIRECTORY directory 1 [ROW FORMAT row format] [STORED AS file format] SELECT... FROM...
  • insert overwrite local directory '/home/hadoop/data/outputemp2' 
    row format delimited fields terminated by "\t"
    select * from emp; 

    FROM from_statement INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1 [INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2]...//여러 테이블로 내보내기
    from emp
    INSERT OVERWRITE  LOCAL DIRECTORY '/home/hadoop/tmp/hivetmp1'
    ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
    select empno, ename  
    INSERT OVERWRITE  LOCAL DIRECTORY '/home/hadoop/tmp/hivetmp2'
    ROW FORMAT DELIMITED FIELDS TERMINATED BY "\t"
    select ename;   

    Hive-e로 데이터 내보내기
  • 조회문 각 부서의 평균 임금select deptno, avg(salary)fromemp group by deptno;

  • select ename, deptno, avg(salary) from emp group by deptno; Expression not in GROUP BY key'ename'//대응 관계 주의
    select에 나타나는 필드, 그룹 by에 나타나거나 집합 함수
    각 부서, 직무의 최고 임금 select deptno,job,max(salary)fromemp group by deptno,job;
    각 부문의 평균 임금은 2000select deptno, avg(salary) avg 보다 크다sal from emp group by deptno having avg_sal>2000;
    검색 결과에 대해 select ename,sal,case when sal>1 and sal<=1000 then'lower'when sal>1000 and sal<=2000 then'moddle'when sal>2000 and sal<=4000 then'high'else'highest'end from emp;
    hive> select ename,sal,
        > case
        > when sal>1 and sal<=1000 then 'lower'
        > when sal>1000 and sal <=2000 then 'moddle'
        > when sal>2000 and sal <=4000 then 'high'
        > else 'highest'
        > from emp;
    FAILED: ParseException line 7:0 missing KW_END at 'from' near ''
    hive> select ename,sal,
        > case
        > when sal>1 and sal<=1000 then 'lower'
        > when sal>1000 and sal <=2000 then 'moddle'
        > when sal>2000 and sal <=4000 then 'high'
        > else 'highest'
        > end
        > from emp;
    OK
    SMITH   800.0   lower
    ALLEN   1600.0  moddle
    WARD    1250.0  moddle
    JONES   2975.0  high
    MARTIN  1250.0  moddle
    BLAKE   2850.0  high
    CLARK   2450.0  high
    SCOTT   3000.0  high
    KING    5000.0  highest
    TURNER  1500.0  moddle
    ADAMS   1100.0  moddle
    JAMES   950.0   lower
    FORD    3000.0  high
    MILLER  1300.0  moddle
    Time taken: 2.191 seconds, Fetched: 14 row(s)
    

    검색 결과를 select count(1)from emp where empno=7566 union all select count(1)from emp where empno=7654로 통합하기;
    hive> select count(1) from emp where empno=7566
        > union all
        > select count(1) from emp where empno=7654;
    Query ID = hadoop_20180109015050_8d760d00-2c6e-4cc4-a99e-5573e64bfc9b
    Total jobs = 3
    Launching Job 1 out of 3
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=
    Starting Job = job_1515472546059_0002, Tracking URL = http://hadoop:8088/proxy/application_1515472546059_0002/
    Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop job  -kill job_1515472546059_0002
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2018-01-09 01:52:52,767 Stage-1 map = 0%,  reduce = 0%
    2018-01-09 01:53:15,692 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.93 sec
    2018-01-09 01:53:33,559 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 8.71 sec
    MapReduce Total cumulative CPU time: 8 seconds 710 msec
    Ended Job = job_1515472546059_0002
    Launching Job 2 out of 3
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=
    Starting Job = job_1515472546059_0003, Tracking URL = http://hadoop:8088/proxy/application_1515472546059_0003/
    Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop job  -kill job_1515472546059_0003
    Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 1
    2018-01-09 01:53:52,732 Stage-3 map = 0%,  reduce = 0%
    2018-01-09 01:54:15,733 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 7.23 sec
    2018-01-09 01:54:37,718 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 9.75 sec
    MapReduce Total cumulative CPU time: 9 seconds 750 msec
    Ended Job = job_1515472546059_0003
    Launching Job 3 out of 3
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_1515472546059_0004, Tracking URL = http://hadoop:8088/proxy/application_1515472546059_0004/
    Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.7.0/bin/hadoop job  -kill job_1515472546059_0004
    Hadoop job information for Stage-2: number of mappers: 2; number of reducers: 0
    2018-01-09 01:54:56,536 Stage-2 map = 0%,  reduce = 0%
    2018-01-09 01:55:21,218 Stage-2 map = 50%,  reduce = 0%, Cumulative CPU 2.49 sec
    2018-01-09 01:55:22,329 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 4.81 sec
    MapReduce Total cumulative CPU time: 4 seconds 810 msec
    Ended Job = job_1515472546059_0004
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 8.71 sec   HDFS Read: 7881 HDFS Write: 114 SUCCESS
    Stage-Stage-3: Map: 1  Reduce: 1   Cumulative CPU: 9.75 sec   HDFS Read: 7886 HDFS Write: 114 SUCCESS
    Stage-Stage-2: Map: 2   Cumulative CPU: 4.81 sec   HDFS Read: 5348 HDFS Write: 4 SUCCESS
    Total MapReduce CPU Time Spent: 23 seconds 270 msec
    OK
    1
    1
    Time taken: 179.507 seconds, Fetched: 2 row(s)
    
    
    

    import과 export의 사용은 데이터를 가져오는 것과 내보내는 특징을 사용할 때 메타데이터도 함께 내보내고 가져오며 서로 다른 Hadoop에 이식할 수 있어 이식성이 있다.
  • export EXPORT TABLE tablename [PARTITION (part_column=”value”[, …])] TO ‘export_target_path’ [ FOR replication(‘eventid’) ]
  • import

  • IMPORT [[EXTERNAL] TABLE new_or_original_tablename [PARTITION (part_column=”value”[, …])]] FROM ‘source_path’ [LOCATION ‘import_target_path’]
  • 예시 가져오기표 export table emp to'/emp/emp.sql’ import table new_emp from ‘/emp/emp.sql’

  • 가져오기 파티션 테이블 export table emp 내보내기dy_partition partition(deptno=30) to ‘/exprt’; import table new_emp_dy partition (deptno=30) from ‘/exprt’;
    약택 빅데이터 교류군: 671914634

    좋은 웹페이지 즐겨찾기