Hive----窗口函数

简介: 窗口函数

需求1:计算2017年4月份购买过的顾客及总人数

select name,count(*) from business where substring(orderdate,1,7)='2017-04' group by name;

 

select name,count(*) over(rows between unbounded preceding and unbounded following) from business where substring(orderdate,1,7)='2017-04' group by name;

 

窗口函数:函数 + 窗口

支持开窗的函数:

①windowing function:lead、lag、first_value、last_value

②聚合函数:count、sum、min、max、avg

③排名函数:RANK、ROW_NUMBER、DENSE_RANK、CUME_DIST、PERCENT_RANK、NTILE

 

函数 over(partition by 字段名 order by 字段名 window clause)

 

window clause:用来控制数据的计算范围

(ROWS | RANGE) BETWEEN (UNBOUNDED | [num]) PRECEDING AND ([num] PRECEDING | CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)

(ROWS | RANGE) BETWEEN CURRENT ROW AND (CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)

(ROWS | RANGE) BETWEEN [num] FOLLOWING AND (UNBOUNDED | [num]) FOLLOWING

 

rows between = range between

 

rows between unbounded preceding and unbounded following :从数据范围的上边界到下边界

rows between 2 preceding and unbounded following :从往前数2行到下边界

rows between current row and unbounded following :从当前行到下边界

rows between 2 preceding and current row : 从往前数两行到当前行

rows between 2 preceding and 3 following :从往前数2行到往后数3行

rows between unbounded preceding and current row :从上边界到当前行

...

...

可以任意改变数据的计算范围

 

需求2:查询顾客的购买明细及月购买总额

select *,sum(cost) over(partition by name,substring(orderdate,1,7))from business;

 

需求3:上述的场景, 将每个顾客的cost按照日期进行累加

select *,sum(cost) over(partition by name order by orderdate)from business;

注意:当开窗函数中有order by的时候,假如不写rows between,那么数据的计算范围相当于 rows between unbounded preceding and current row

first_value()last_value()sum()min()max()avg()count() order by 约束!

lag() 、lead() 和排序函数不受 order by 约束!

select *,sum(cost) over(partition by name order by orderdate rows between unbounded preceding and current row)from business;

 

需求4:查询每个顾客的上次消费时间

select *,lag(orderdate,1,'no data') over(partition by name order by orderdate) from business;

 

需求5:查询前20%时间的订单信息

select *,ntile(5)  over(order by orderdate) num from business;  --t1

select * from (select *,ntile(5)  over(order by orderdate) num from business) t1 where num=1;

 

需求6:求每个顾客的购买明细,及每个月的首次购买时间

first_value:取数据计算范围的第一个值

select *,first_value(orderdate,true) over(partition by name,substring(orderdate,1,7) order by orderdate) from business;

注:true  表示取第一个不为null 的值

 

需求7:求每个顾客的购买明细,及每个月最后一次购买时间

select *,last_value(orderdate,true) over(partition by name,substring(orderdate,1,7) order by orderdate rows between unbounded preceding and unbounded following) from business;

注:true  表示取最后一个不为null 的值

 

需求8:求每个顾客的购买明细和最近3次的购买总额

select *,sum(cost) over(partition by name order by orderdate rows between 2 preceding and current row)from business;

 

 

排名函数:

孙悟空        语文        87

孙悟空        数学        95

孙悟空        英语        68

大海        语文        94

大海        数学        56

大海        英语        84

宋宋        语文        64

宋宋        数学        86

宋宋        英语        84

婷婷        语文        65

婷婷        数学        85

婷婷        英语        78

 

rank()

允许并列排名,会跳号

dense_rank()

允许并列,不跳号

row_number()

行号,不并列,不跳号

cume_dist()

行数/分组内总行数

PERCENT_RANK()

行数 -1 /分组内总行数 -1

ntile(n)

将数据平均分配到 n 个桶中;如果不能平均分配,则优先分配较小编号的桶

补充:cume_dist() 和 ntile(n) 可以用于求前 N/% 的数据


需求:

1)计算每门学科成绩排名。

 

select *,

rank() over(partition by sub order by score desc) rank,

row_number() over(partition by sub order by score desc) row_numer,

dense_rank() over(partition by sub order by score desc) dense_rank,

cume_dist() over(partition by sub order by score desc) cume_dist,

percent_rank() over(partition by sub order by score desc) percent_rank

from testdb6.scores;

 

 

(2) 计算每个学科前三名学生

 

select *,ROW_NUMBER() over(partition by subject order by score desc) row_no  

from score;

 

select *

from

(select *,ROW_NUMBER() over(partition by subject order by score desc) row_no  

from score)t1

where row_no <4;

 

(3)计算每个学生的总分排名

①先计算总分

select name,sum(score) ss

from score

group by name;

 

 

②按总分排名

select *,

rank() over(order by ss desc)

from

(select name,sum(score) ss

from score

group by name)t1;

 

或:

select *,

dense_rank() over(order by ss desc)

from

(select name,sum(score) over(partition by name)ss

from score)t1;

 

 

(4) 求学生的成绩明细,总分和总分排名

select *,

sum(score) over(partition by name) ss,          ×

rank() over(order by ss desc)

from score;

 

select *,sum(score) over(partition by name) ss    --t1

from score;

 

select *,

dense_rank() over(order by ss desc)

from

(select *,sum(score) over(partition by name) ss  

from score)t1;

 

(5)求学生的成绩明细,并显示当前科目的最高分

select *,max(score) over(partition by sub ) ascore from scores order by ascore;

 

 

目录
相关文章
|
7月前
|
SQL HIVE
hive窗口函数应用实例
hive窗口函数应用实例
172 3
|
7月前
|
SQL HIVE
Hive 【Hive(七)窗口函数练习】
Hive 【Hive(七)窗口函数练习】
|
7月前
|
SQL 分布式计算 Serverless
Hive【Hive(六)窗口函数】
Hive【Hive(六)窗口函数】
|
SQL 大数据 开发工具
大数据Hive窗口函数应用实例 2
大数据Hive窗口函数应用实例
160 0
|
SQL 大数据 开发工具
大数据Hive窗口函数应用实例 1
大数据Hive窗口函数应用实例
97 0
|
3月前
|
SQL JavaScript 前端开发
Hive学习-lateral view 、explode、reflect和窗口函数
Hive学习-lateral view 、explode、reflect和窗口函数
47 4
|
7月前
|
SQL HIVE 索引
Hive窗口函数案例总结
Hive窗口函数案例总结
|
7月前
|
SQL
bigdata-23-Hive窗口函数
bigdata-23-Hive窗口函数
51 0
Hive学习---4、函数(单行函数、高级聚合函数、炸裂函数、窗口函数)(二)
Hive学习---4、函数(单行函数、高级聚合函数、炸裂函数、窗口函数)(二)
|
SQL HIVE
Hive 常用的窗口函数【高频重点】(下)
Hive 常用的窗口函数【高频重点】(下)
83 0