Hive_문법행렬 전환

[TOC]
Hive 로 데 이 터 를 처리 할 때 행렬 이 서로 바 뀌 는 수 요 를 자주 만 나 행렬 이 바 뀌 는 흔 한 장면 과 조작 문법 을 정리 하고 기록 합 니 다.
그 중에서 모든 조작 은 문 구 를 직접 복사 하여 자신의 hive 로 실행 하고 결 과 를 볼 수 있다.
선행 지식
hiv 또는 beeline 진입, 실행 을 실행 합 니 다.desc function explode; 함수 설명 보기;

explode(a) - separates the elements of array a into multiple rows, or the elements of a map into multiple rows and columns
              ，   map       ， Hive   UDTF

split(str, regex) - Splits str around occurances that match regex
                
    
collect_list(x) - Returns a list of objects with duplicates
            
    
collect_set(x) - Returns a set of objects with duplicate elements eliminated
             

concat_ws(separator, [string | array(string)]+) - returns the concatenation of the strings separated by the separator
                   

max(expr) - Returns the maximum value of expr

준비 데이터
name, subject, score 의 기 말 시험 성적 표를 만 들 고 모든 학생 들 의 학과 별 성적 을 대표 합 니 다.
데 이 터 를 가 져 와 도 텍스트 를 사용자 정의 하고 load 할 수 있 습 니 다. sql 문 구 는 한 걸음 에 도착 하여 테스트 가 간단 하고 편리 합 니 다.

행 전용 열 에 사용 되 는 데이터, 후 표 1

create table school_final_test as
select 'jack' as name, 'english' as subject, 70 as score union all
select 'jack' as name, 'math' as subject, 80 as score union all
select 'jack' as name, 'chinese' as subject, 90 as score union all
select 'tim' as name, 'english' as subject, 10 as score union all
select 'tim' as name, 'math' as subject, 20 as score union all
select 'tim' as name, 'chinese' as subject, 30 as score;

표 1 데이터:
name
subject
score
jack
english
70
jack
math
80
jack
chinese
90
tim
english
10
tim
math
20
tim
chinese
30

열 전 행 에 사용 되 는 데이터, 후 칭 표 2

create table school_final_test1 as
select 'jack' as name, 70 as english,80 as math, 90 as chinese union all
select 'tim' as name, 10 as english,20 as math, 30 as chinese;

표 2 데이터:
name
english
math
chinese
jack
70
80
90
tim
10
20
30
테스트 시작
수요
다 중 줄 다 중 열, 데이터 원본 은 표 1 입 니 다.
결과 표:
name
english
math
chinese
jack
70
80
90
tim
10
20
30

group by + max + case when 문법

수치 형 과 0 사이 의 크기 관 계 를 교묘 하 게 사용 했다

select name,
max(case subject when 'english' then score else 0 end) as english,
max(case subject when 'math' then score else 0 end) as math,
max(case subject when 'chinese' then score else 0 end) as chinese
from school_final_test
group by name;

의문: 다 중 줄 이 필요 한 것 은 수치 형 이 아니 라 str 유형 이 가능 합 니까?

select max(str) from
    (select 'str' as str union all
    select 'sts' as str union all
    select null as str)t1; -- result : sts

문자열 이 라면 null 을 기본 값 으로 사용 할 수 있 음 이 분명 합 니 다. max 는 여전히 가능 합 니 다

수요
다 중 줄 이 단일 열 로 바 뀌 고 데이터 원본 은 표 1 입 니 다.
결과 표:
name
scores
jack
english:70,math:80,chinese:90
tim
english:10,math:20,chinese:30

group by + collect_list + concat_ws 문법

주의: 함수 에서 요구 하 는 유형

select name,concat_ws(',',
    collect_list(
        concat_ws(':',subject,cast(score as string))
        )
    ) as scores
from school_final_test
group by name;

수요
다 열 다 중 줄 전환, 데이터 원본 은 표 2
결과 표:
name
subject
score
jack
english
70
jack
math
80
jack
chinese
90
tim
english
10
tim
math
20
tim
chinese
30

각 열 에 필요 한 데 이 터 를 따로 추출 하여 합병 하면 된다

주의: 유 니 온 과 유 니 온 all 의 차 이 는 무 거 운 것 이 아 닌 지

select name,'english' as subject,english as score from school_final_test1 
union all 
select name,'math' as subject,math as score from school_final_test1 
union all 
select name,'chinese' as subject,chinese as score from school_final_test1;

수요
일방 통행 다 열
데이터 원본:

create table school_final_test2 as
select name,concat_ws(',',
    collect_list(
        concat_ws(':',subject,cast(score as string))
        )
    ) as scores
from school_final_test
group by name;

name
scores
jack
english:70,math:80,chinese:90
tim
english:10,math:20,chinese:30
결과 표:
name
scores
jack
english:70
jack
math:80
jack
chinese:90
tim
english:10
tim
math:20
tim
chinese:30

split + explode 문법

Lateral View 는 사용자 정의 함수 (UDFS) 를 연결 하 는 접속사 로 여러 개 를 사용 할 수 있 습 니 다.터 진 열 에 null 값 이 있 지만 view 뒤에 outer 를 추가 할 수 있 는 표 의 왼쪽 외부 연결 을 표시 해 야 합 니 다.

그 중에서 table 1 은 창문 을 열 어 얻 은 새 표 의 별명 을 표시 합 니 다. 반드시 써 야 하지만 사용 하지 않 아 도 됩 니 다 (사용 하지 않 을 때 중복 되 지 않 는 열 을 보증 해 야 합 니 다)

select name,table1.scores as scores
from school_final_test2
lateral view explode(split(scores,',')) table1 as scores;

결과 표를 표 1 형식 으로 변환 하려 면 scores 필드 를 : 에 따라 계속 분할 한 다음 에 배열 의 각 표 방식 {{{arrya[index]}}} 으로 대응 하 는 필드

를 가 져 올 수 있 습 니 다.

select name,split(scores,':')[0] as subject,split(scores,':')[1] as score from (
    select name,table1.scores as scores
    from school_final_test2
    lateral view explode(split(scores,',')) table1 as scores
)t1;

수요 "keyl=valuel&key2=value2...keyn=valuen" ， keyx valuex

핵심 사고방식 은 문자열 안의 데 이 터 를 맵 구조 에서 미 친 듯 이 돌리 면 내장 함수 str_to_map 를 사용 할 수 있다 는 것 이다. 다른 복잡 한 조작 이 있 으 면 UDF 를 사용자 정의 할 수 있다.

str_to_map(text, delimiter1, delimiter2) - Creates a map by parsing text; Split text into key-value pairs using two delimiters. The first delimiter seperates pairs, and the second delimiter sperates key and value . If only one parameter is given, default delimiters are used: ',' as delimiter1 and '=' as delimiter2.*

select result['key1'] from 
    (select str_to_map('key1=value1&key2=value2&keyn=valuen','&','=') as result)t1;

이 내용에 흥미가 있습니까?

현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:

Hive 복잡 한 데이터 구조 삽입

Hive Hive 기본 데이터 구조 지원 제외 Hive 복잡 한 데이터 구조: 데이터 형식 hive 표 구조 디자인: select :...

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.

Hive_문법행렬 전환

좋은 웹페이지 즐겨찾기