窗口函数:
支持开窗的函数:
1、Windowing functions
lead :LEAD (scalar_expression [,offset] [,default]) 查询某一列往后数第offset的值,如果找不到,就取default
lag :LEAD (scalar_expression [,offset] [,default]) 查询某一列往前数第offset的值,如果找不到,就取default
first_value :取数据集第一行的值
last_value :取数据集最后一行的值
2、聚合函数
count
sum
min
max
avg
3、分析函数
RANK
ROW_NUMBER
DENSE_RANK
CUME_DIST
PERCENT_RANK
NTILE :nile(n),将数据集平均分为n份,给每份从1开始标记一个数字
格式:函数名 over(partition by 字段 order by 字段 rows between ... and ...)
(ROWS | RANGE) BETWEEN (UNBOUNDED | [num]) PRECEDING AND ([num] PRECEDING | CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
(ROWS | RANGE) BETWEEN CURRENT ROW AND (CURRENT ROW | (UNBOUNDED | [num]) FOLLOWING)
(ROWS | RANGE) BETWEEN [num] FOLLOWING AND (UNBOUNDED | [num]) FOLLOWING
上无边界:unbounded preceding
前面n行:n preceding
当前行:current row
往后n行:n following
下无边界:unbounded following
jack 2017-01-01 10
tony 2017-01-02 15
jack 2017-02-03 23
tony 2017-01-04 29
jack 2017-01-05 46
jack 2017-04-06 42
tony 2017-01-07 50
jack 2017-01-08 55
mart 2017-04-08 62
mart 2017-04-09 68
neil 2017-05-10 12
mart 2017-04-11 75
neil 2017-06-12 80
mart 2017-04-13 94
(2)查询顾客的购买明细及月购买总额
select *,sum(cost) over(partition by name,substring(orderdate,1,7) rows between unbounded preceding and unbounded following)
from business;
(3)上述的场景, 将每个顾客的cost按照日期进行累加
1 2 2
2 4 6
3 1 7
4 7 14
select *,
sum(cost) over(partition by name order by orderdate rows between unbounded preceding and current row)
from business;
(4)查看顾客上次的购买时间
select *,
lag(orderdate,1,-1) over(partition by name order by orderdate)
from business;
select *,
lead(orderdate,1,-1) over(partition by name order by orderdate desc)
from business;
(5)查询前20%时间的订单信息
select *,
ntile(5) over(order by orderdate) no
from business;
with t1 as(select *,
ntile(5) over(order by orderdate) no
from business)
select * from t1 where no=1;
需求6:求每个顾客的购买明细,及每个月的首次购买时间
select *,first_value(orderdate) over(partition by name,substring(orderdate,1,7) order by orderdate)
from business;
select *,last_value(orderdate) over(partition by name,substring(orderdate,1,7) order by orderdate desc) ×
from business;
select *,last_value(orderdate) over(partition by name,substring(orderdate,1,7) order by orderdate desc
rows between unbounded preceding and unbounded following)
from business;
注意:
当over()中有order by 没有rows between时,默认的取值范围是rows between unbounded preceding and current row,lead和lag函数没有受此约束
当over()中没有order by,也没有rows between时,默认的取值范围是rows between unbounded preceding and unbounded following
需求7:求每个顾客的购买明细,及每个月最后一次购买时间
select *,last_value(orderdate) over(partition by name,substring(orderdate,1,7) order by orderdate rows between unbounded preceding and unbounded following)
from business;
需求8:求每个顾客的购买明细和最近3次的购买总额
select *,
sum(cost) over(partition by name order by orderdate rows between 2 preceding and current row)
from business;
孙悟空 语文 87
孙悟空 数学 95
孙悟空 英语 68
大海 语文 94
大海 数学 56
大海 英语 84
宋宋 语文 64
宋宋 数学 86
宋宋 英语 84
婷婷 语文 65
婷婷 数学 85
婷婷 英语 78
计算每门学科成绩排名。
RANK :计算排名顺序,允许并列,遇到并列下一个序号会跳过
ROW_NUMBER :行号
DENSE_RANK :计算排名顺序,允许并列,遇到并列下一个序号不会跳过
CUME_DIST :计算从第一行到当前行的行数占总行数的比例
PERCENT_RANK :rank -1 /总行数 -1
NTILE :nile(n),将数据集平均分为n份,给每份从1开始标记一个数字
select *,
rank() over(partition by subject order by score desc) rank_no,
ROW_NUMBER() over(partition by subject order by score desc) row_no,
DENSE_RANK() over(partition by subject order by score desc) dense_no,
CUME_DIST() over(partition by subject order by score desc) cume_no,
PERCENT_RANK() over(partition by subject order by score desc) PERCENT_no
from score;
(2) 计算每个学科前三名学生
select *,ROW_NUMBER() over(partition by subject order by score desc) row_no
from score;
select *
from
(select *,ROW_NUMBER() over(partition by subject order by score desc) row_no
from score)t1
where row_no <4;
(3)计算每个学生的总分排名
①先计算总分
select name,sum(score) ss
from score
group by name;
②按总分排名
select *,
rank() over(order by ss desc)
from
(select name,sum(score) ss
from score
group by name)t1;
或:
select *,
dense_rank() over(order by ss desc)
from
(select name,sum(score) over(partition by name)ss
from score)t1;
(4) 求学生的成绩明细,总分和总分排名
select *,
sum(score) over(partition by name) ss, ×
rank() over(order by ss desc)
from score;
select *,sum(score) over(partition by name) ss --t1
from score;
select *,
dense_rank() over(order by ss desc)
from
(select *,sum(score) over(partition by name) ss
from score)t1;
(5)求学生的成绩明细,并显示当前科目的最高分