十、容器--反转内容
表名:t10
表字段及内容:
a AB,CA,BAD BD,EA
问题一:反转逗号分隔的数据:改变顺序,内容不变
输出结果如下所示:
BAD,CA,AB EA,BD
参考答案:
select a, concat_ws(",",collect_list(reverse(str))) from ( select a, str from t10 lateral view explode(split(reverse(a),",")) t as str ) tmp1 group by a;
问题二:反转逗号分隔的数据:改变内容,顺序不变
输出结果如下所示:
BA,AC,DAB DB,AE
参考答案:
select a, concat_ws(",",collect_list(reverse(str))) from ( select a, str from t10 lateral view explode(split(a,",")) t as str ) tmp1 group by a;
十一、多容器--成对提取数据
表名:t11
表字段及内容:
a b A/B 1/3 B/C/D 4/5/2
问题一:成对提取数据,字段一一对应
输出结果如下所示:
a b A 1 B 3 B 4 C 5 D 2
参考答案:
select a_inx, b_inx from ( select a, b, a_id, a_inx, b_id, b_inx from t11 lateral view posexplode(split(a,'/')) t as a_id,a_inx lateral view posexplode(split(b,'/')) t as b_id,b_inx ) tmp where a_id=b_id;
十二、多容器--转多行
表名:t12
表字段及内容:
a b c 001 A/B 1/3/5 002 B/C/D 4/5
问题一:转多行
输出结果如下所示:
a d e 001 type_b A 001 type_b B 001 type_c 1 001 type_c 3 001 type_c 5 002 type_b B 002 type_b C 002 type_b D 002 type_c 4 002 type_c 5
参考答案:
select a, d, e from ( select a, "type_b" as d, str as e from t12 lateral view explode(split(b,"/")) t as str union all select a, "type_c" as d, str as e from t12 lateral view explode(split(c,"/")) t as str ) tmp order by a,d;
十三、抽象分组--断点排序
表名:t13
表字段及内容:
a b 2014 1 2015 1 2016 1 2017 0 2018 0 2019 -1 2020 -1 2021 -1 2022 1 2023 1
问题一:断点排序
输出结果如下所示:
a b c 2014 1 1 2015 1 2 2016 1 3 2017 0 1 2018 0 2 2019 -1 1 2020 -1 2 2021 -1 3 2022 1 1 2023 1 2
参考答案:
select a, b, row_number() over( partition by b,repair_a order by a asc) as c--按照b列和[b的组首]分组,排序 from ( select a, b, a-b_rn as repair_a--根据b列值出现的次序,修复a列值为b首次出现的a列值,称为b的[组首] from ( select a, b, row_number() over( partition by b order by a asc ) as b_rn--按b列分组,按a列排序,得到b列各值出现的次序 from t13 )tmp1 )tmp2--注意,如果不同的b列值,可能出现同样的组首值,但组首值需要和a列值 一并参与分组,故并不影响排序。 order by a asc;
十四、业务逻辑的分类与抽象--时效
日期表:d_date
表字段及内容:
date_id is_work 2017-04-13 1 2017-04-14 1 2017-04-15 0 2017-04-16 0 2017-04-17 1
工作日:周一至周五09:30-18:30
客户申请表:t14
表字段及内容:
a b c 1 申请 2017-04-14 18:03:00 1 通过 2017-04-17 09:43:00 2 申请 2017-04-13 17:02:00 2 通过 2017-04-15 09:42:00
问题一:计算上表中从申请到通过占用的工作时长
输出结果如下所示:
a d 1 0.67h 2 10.67h
参考答案:
select a, round(sum(diff)/3600,2) as d from ( select a, apply_time, pass_time, dates, rn, ct, is_work, case when is_work=1 and rn=1 then unix_timestamp(concat(dates,' 18:30:00'),'yyyy-MM-dd HH:mm:ss')-unix_timestamp(apply_time,'yyyy-MM-dd HH:mm:ss') when is_work=0 then 0 when is_work=1 and rn=ct then unix_timestamp(pass_time,'yyyy-MM-dd HH:mm:ss')-unix_timestamp(concat(dates,' 09:30:00'),'yyyy-MM-dd HH:mm:ss') when is_work=1 and rn!=ct then 9*3600 end diff from ( select a, apply_time, pass_time, time_diff, day_diff, rn, ct, date_add(start,rn-1) dates from ( select a, apply_time, pass_time, time_diff, day_diff, strs, start, row_number() over(partition by a) as rn, count(*) over(partition by a) as ct from ( select a, apply_time, pass_time, time_diff, day_diff, substr(repeat(concat(substr(apply_time,1,10),','),day_diff+1),1,11*(day_diff+1)-1) strs from ( select a, apply_time, pass_time, unix_timestamp(pass_time,'yyyy-MM-dd HH:mm:ss')-unix_timestamp(apply_time,'yyyy-MM-dd HH:mm:ss') time_diff, datediff(substr(pass_time,1,10),substr(apply_time,1,10)) day_diff from ( select a, max(case when b='申请' then c end) apply_time, max(case when b='通过' then c end) pass_time from t14 group by a ) tmp1 ) tmp2 ) tmp3 lateral view explode(split(strs,",")) t as start ) tmp4 ) tmp5 join d_date on tmp5.dates = d_date.date_id ) tmp6 group by a;
十五、时间序列--进度及剩余
表名:t15
表字段及内容:
date_id is_work 2017-07-30 0 2017-07-31 1 2017-08-01 1 2017-08-02 1 2017-08-03 1 2017-08-04 1 2017-08-05 0 2017-08-06 0 2017-08-07 1
问题一:求每天的累计周工作日,剩余周工作日
输出结果如下所示:
date_id week_to_work week_left_work 2017-07-31 1 4 2017-08-01 2 3 2017-08-02 3 2 2017-08-03 4 1 2017-08-04 5 0 2017-08-05 5 0 2017-08-06 5 0
参考答案:
此处给出两种解法,其一:
select date_id ,case date_format(date_id,'u') when 1 then 1 when 2 then 2 when 3 then 3 when 4 then 4 when 5 then 5 when 6 then 5 when 7 then 5 end as week_to_work ,case date_format(date_id,'u') when 1 then 4 when 2 then 3 when 3 then 2 when 4 then 1 when 5 then 0 when 6 then 0 when 7 then 0 end as week_to_work from t15
其二:
select date_id, week_to_work, week_sum_work-week_to_work as week_left_work from( select date_id, sum(is_work) over(partition by year,week order by date_id) as week_to_work, sum(is_work) over(partition by year,week) as week_sum_work from( select date_id, is_work, year(date_id) as year, weekofyear(date_id) as week from t15 ) ta ) tb order by date_id;
十六、时间序列--构造日期
问题一:直接使用SQL实现一张日期维度表,包含以下字段:
date string 日期 d_week string 年内第几周 weeks int 周几 w_start string 周开始日 w_end string 周结束日 d_month int 第几月 m_start string 月开始日 m_end string 月结束日 d_quarter int 第几季 q_start string 季开始日 q_end string 季结束日 d_year int 年份 y_start string 年开始日 y_end string 年结束日
参考答案:
drop table if exists dim_date; create table if not exists dim_date( `date` string comment '日期', d_week string comment '年内第几周', weeks string comment '周几', w_start string comment '周开始日', w_end string comment '周结束日', d_month string comment '第几月', m_start string comment '月开始日', m_end string comment '月结束日', d_quarter int comment '第几季', q_start string comment '季开始日', q_end string comment '季结束日', d_year int comment '年份', y_start string comment '年开始日', y_end string comment '年结束日' ); --自然月: 指每月的1号到那个月的月底,它是按照阳历来计算的。就是从每月1号到月底,不管这个月有30天,31天,29天或者28天,都算是一个自然月。 insert overwrite table dim_date select `date` , d_week --年内第几周 , case weekid when 0 then '周日' when 1 then '周一' when 2 then '周二' when 3 then '周三' when 4 then '周四' when 5 then '周五' when 6 then '周六' end as weeks -- 周 , date_add(next_day(`date`,'MO'),-7) as w_start --周一 , date_add(next_day(`date`,'MO'),-1) as w_end -- 周日_end -- 月份日期 , concat('第', monthid, '月') as d_month , m_start , m_end -- 季节 , quarterid as d_quart , concat(d_year, '-', substr(concat('0', (quarterid - 1) * 3 + 1), -2), '-01') as q_start --季开始日 , date_sub(concat(d_year, '-', substr(concat('0', (quarterid) * 3 + 1), -2), '-01'), 1) as q_end --季结束日 -- 年 , d_year , y_start , y_end from ( select `date` , pmod(datediff(`date`, '2012-01-01'), 7) as weekid --获取周几 , cast(substr(`date`, 6, 2) as int) as monthid --获取月份 , case when cast(substr(`date`, 6, 2) as int) <= 3 then 1 when cast(substr(`date`, 6, 2) as int) <= 6 then 2 when cast(substr(`date`, 6, 2) as int) <= 9 then 3 when cast(substr(`date`, 6, 2) as int) <= 12 then 4 end as quarterid --获取季节 可以直接使用 quarter(`date`) , substr(`date`, 1, 4) as d_year -- 获取年份 , trunc(`date`, 'YYYY') as y_start --年开始日 , date_sub(trunc(add_months(`date`, 12), 'YYYY'), 1) as y_end --年结束日 , date_sub(`date`, dayofmonth(`date`) - 1) as m_start --当月第一天 , last_day(date_sub(`date`, dayofmonth(`date`) - 1)) m_end --当月最后一天 , weekofyear(`date`) as d_week --年内第几周 from ( -- '2021-04-01'是开始日期, '2022-03-31'是截止日期 select date_add('2021-04-01', t0.pos) as `date` from ( select posexplode( split( repeat('o', datediff( from_unixtime(unix_timestamp('2022-03-31', 'yyyy-mm-dd'), 'yyyy-mm-dd'), '2021-04-01')), 'o' ) ) ) t0 ) t1 ) t2;