[원본 공유] HIVE 표 데이터 양 통계 & 메 일

개요:
HIVE BI 라 이브 러 리 에서 매일 데이터 시트 의 총 크기 와 증 가 를 계산 합 니 다.
출력:
总大小:xxxG
日同比新增数据量:xxxG
周同比新增数据量:xxxG
月同比新增数据量:xxxG

总表数:xxx
日新增表数:xxx
周新增表数:xxx
月新增表数:xxx

最大的20张表:
......

表数据增量TOP20:
......

코드: bireport.sh
#!/bin/bash

GIGA=1000000000

content="
" prefix='bi' today=`date -d"-1 day" +%Y-%m-%d` yestoday=`date -d"-2 day" +%Y-%m-%d` lastweek=`date -d"-8 day" +%Y-%m-%d` lastmonth=`date -d"-1 month -1 day" +%Y-%m-%d` #此处需要做重试,否则,当有数据被删除的时候,统计失败 hadoop fs -du /user/hive/warehouse/bi.db/  > ./tmp.bi.db.stat while [ $? -ne 0 ] do         hadoop fs -du /user/hive/warehouse/bi.db/  > ./tmp.bi.db.stat done #dpstg开头的表不做统计 grep -v dpstg ./tmp.bi.db.stat > bi.db.stat  awk '{if(NR!=1) {size=split($2,table_list,"/"); print table_list[size],"\t",$1}}' ./bi.db.stat > $prefix.$today sum=`awk '{sum+=$2} END{printf("%d",sum/"'$GIGA'")}' $prefix.$today` content+="总大小:"$sum"G
" yes_sum=`awk '{sum+=$2} END{print sum/"'$GIGA'"}' $prefix.$yestoday` diff_size=`echo "$sum-$yes_sum" | bc` content+="日同比新增数据量:"$diff_size"G
" week_sum=`awk '{sum+=$2} END{print sum/"'$GIGA'"}' $prefix.$lastweek` diff_size=`echo "$sum-$week_sum" | bc` content+="周同比新增数据量:"$diff_size"G
" month_sum=`awk '{sum+=$2} END{print sum/"'$GIGA'"}' $prefix.$lastmonth` diff_size=`echo "$sum-$month_sum" | bc` content+="月同比新增数据量:"$diff_size"G
" table_count=`wc -l $prefix.$today | awk '{print $1}'` content+="

总表数:"$table_count"
" yes_table_count=`wc -l $prefix.$yestoday | awk '{print $1}'` diff_table_count=`echo "$table_count-$yes_table_count" | bc` content+="日新增表数:"$diff_table_count"
" lastweek_table_count=`wc -l $prefix.$lastweek | awk '{print $1}'` diff_table_count=`echo "$table_count-$lastweek_table_count" | bc` content+="周新增表数:"$diff_table_count"
" lastmonth_table_count=`wc -l $prefix.$lastmonth | awk '{print $1}'` diff_table_count=`echo "$table_count-$lastmonth_table_count" | bc` content+="月新增表数:"$diff_table_count"
" max_table="`sort -r -n -k 2 $prefix.$today | head -20 | awk '{ORS="";print $1":" $2/"'$GIGA'""G\\\
"}'`" content+="

最大的20张表:
"$max_table"
" declare -A ARR1 while read line do key=`echo $line|awk '{print $1}'` value=`echo $line|awk '{print $2}'` ARR1["$key"]=$value done < $prefix.$today declare -A ARR2 while read line do key=`echo $line|awk '{print $1}'` value=`echo $line|awk '{print $2}'` ARR2["$key"]=$value done < $prefix.$yestoday for k in "${!ARR1[@]}" do d_val=`echo "${ARR1[${k}]}-${ARR2[${k}]}"|bc` ARR1["$k"]=$d_val done cat /dev/null > bi_tmp.txt for k in "${!ARR1[@]}" do echo "$k ${ARR1[${k}]}" >> bi_tmp.txt done max_incre_table="`sort -r -n -k 2 bi_tmp.txt | head -20 | awk '{ORS="";print $1":" $2/"'$GIGA'""G\\\
"}'`" content+="

表数据增量TOP20:
"$max_incre_table"
" #echo -e $content msg=`echo $content` python mail.py "HIVE BI每日统计" $msg

mail.py
#! /usr/bin/python

import sys
import optparse
import os
import smtplib
from email.mime.text import MIMEText

## mail server config
mail_host="mail.51ping.com"
mail_user="info"
mail_pass="chinese1"
mail_postfix="51ping.com"
######################

def send_mail(to_list,sub,content):
    me=mail_user+"<"+mail_user+"@"+mail_postfix+">"
    msg = MIMEText(content)
    msg['Subject'] = sub
    msg['From'] = me
    msg['To'] = ";".join(to_list)
    try:
        s = smtplib.SMTP()
        s.connect(mail_host)
        s.login(mail_user,mail_pass)
        s.sendmail(me, to_list, msg.as_string())
        s.close()
        return True
    except Exception, e:
        print str(e)
        return False
mail_to_list=["[email protected]"]

msg=sys.argv[2]
msg=msg.replace("\
","\r
") print msg send_mail(mail_to_list,sys.argv[1],msg)

실행:
sh bi_report.sh

좋은 웹페이지 즐겨찾기