需求1:计算2017年4月份购买过的顾客及总人数
select name,count(*) from business where substring(orderdate,1,7)='2017-04' group by name;
select name,count(*) over(rows between unbounded preceding and unbounded following) from business where substring(orderdate,1,7)='2017-04' group by name;
窗口函数:函数 + 窗口
支持开窗的函数:
①windowing function:lead、lag、first_value、last_value
②聚合函数:count、sum、min、max、avg
③排名函数:RANK、ROW_NUMBER、DENSE_RANK、CUME_DIST、PERCENT_RANK、NTILE
函数 over(partition by 字段名 order by 字段名 window clause)
window clause:用来控制数据的计算范围
(ROWS | RANGE) BETWEEN (UNBOUNDED | [num]) PRECEDING AND ([num] PRECEDING | CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
(ROWS | RANGE) BETWEEN CURRENT ROW AND (CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
(ROWS | RANGE) BETWEEN [num] FOLLOWING AND (UNBOUNDED | [num]) FOLLOWING
rows between = range between
rows between unbounded preceding and unbounded following :从数据范围的上边界到下边界
rows between 2 preceding and unbounded following :从往前数2行到下边界
rows between current row and unbounded following :从当前行到下边界
rows between 2 preceding and current row : 从往前数两行到当前行
rows between 2 preceding and 3 following :从往前数2行到往后数3行
rows between unbounded preceding and current row :从上边界到当前行
...
...
可以任意改变数据的计算范围
需求2:查询顾客的购买明细及月购买总额
select *,sum(cost) over(partition by name,substring(orderdate,1,7))from business;
需求3:上述的场景, 将每个顾客的cost按照日期进行累加
select *,sum(cost) over(partition by name order by orderdate)from business;
注意:当开窗函数中有order by的时候,假如不写rows between,那么数据的计算范围相当于 rows between unbounded preceding and current row
first_value()、last_value()、sum()、min()、max()、avg()、count() 受order by 约束!
lag() 、lead() 和排序函数不受 order by 约束!
select *,sum(cost) over(partition by name order by orderdate rows between unbounded preceding and current row)from business;
需求4:查询每个顾客的上次消费时间
select *,lag(orderdate,1,'no data') over(partition by name order by orderdate) from business;
需求5:查询前20%时间的订单信息
select *,ntile(5) over(order by orderdate) num from business; --t1
select * from (select *,ntile(5) over(order by orderdate) num from business) t1 where num=1;
需求6:求每个顾客的购买明细,及每个月的首次购买时间
first_value:取数据计算范围的第一个值
select *,first_value(orderdate,true) over(partition by name,substring(orderdate,1,7) order by orderdate) from business;
注:true 表示取第一个不为null 的值
需求7:求每个顾客的购买明细,及每个月最后一次购买时间
select *,last_value(orderdate,true) over(partition by name,substring(orderdate,1,7) order by orderdate rows between unbounded preceding and unbounded following) from business;
注:true 表示取最后一个不为null 的值
需求8:求每个顾客的购买明细和最近3次的购买总额
select *,sum(cost) over(partition by name order by orderdate rows between 2 preceding and current row)from business;
排名函数:
孙悟空 语文 87
孙悟空 数学 95
孙悟空 英语 68
大海 语文 94
大海 数学 56
大海 英语 84
宋宋 语文 64
宋宋 数学 86
宋宋 英语 84
婷婷 语文 65
婷婷 数学 85
婷婷 英语 78
rank() |
允许并列排名,会跳号 |
dense_rank() |
允许并列,不跳号 |
row_number() |
行号,不并列,不跳号 |
cume_dist() |
行数/分组内总行数 |
PERCENT_RANK() |
行数 -1 /分组内总行数 -1 |
ntile(n) |
将数据平均分配到 n 个桶中;如果不能平均分配,则优先分配较小编号的桶 |
补充:cume_dist() 和 ntile(n) 可以用于求前 N/% 的数据 |
需求:
(1)计算每门学科成绩排名。
select *,
rank() over(partition by sub order by score desc) rank,
row_number() over(partition by sub order by score desc) row_numer,
dense_rank() over(partition by sub order by score desc) dense_rank,
cume_dist() over(partition by sub order by score desc) cume_dist,
percent_rank() over(partition by sub order by score desc) percent_rank
from testdb6.scores;
(2) 计算每个学科前三名学生
select *,ROW_NUMBER() over(partition by subject order by score desc) row_no
from score;
select *
from
(select *,ROW_NUMBER() over(partition by subject order by score desc) row_no
from score)t1
where row_no <4;
(3)计算每个学生的总分排名
①先计算总分
select name,sum(score) ss
from score
group by name;
②按总分排名
select *,
rank() over(order by ss desc)
from
(select name,sum(score) ss
from score
group by name)t1;
或:
select *,
dense_rank() over(order by ss desc)
from
(select name,sum(score) over(partition by name)ss
from score)t1;
(4) 求学生的成绩明细,总分和总分排名
select *,
sum(score) over(partition by name) ss, ×
rank() over(order by ss desc)
from score;
select *,sum(score) over(partition by name) ss --t1
from score;
select *,
dense_rank() over(order by ss desc)
from
(select *,sum(score) over(partition by name) ss
from score)t1;
(5)求学生的成绩明细,并显示当前科目的最高分
select *,max(score) over(partition by sub ) ascore from scores order by ascore;