查询语句顺序 |
备注 |
with t1 as(嵌套查询语句) |
|
select 字段1,字段2 over (partition by … order by … rows between … and …) |
最后一个字段可以开窗 |
from 表名/(嵌套查询语句) |
|
join 表名/(嵌套查询语句) |
|
on … |
关联条件 |
where … |
过滤条件 |
group by … |
|
having … |
必须跟 group by连用 |
order by … |
全局排序 |
cluster by … / distribute by … |
分桶 |
sort by … |
桶内排序 |
limit … |
数据量限制 |
1、order by :全表排序 select * from tmp order by name desc,cost desc;
2、distribute by:(对导出到本地的数据分区)相当于mapreduce中的partitioner,可以控制某条数据进入哪个reduce(前提是reduce的数量不是1),规则是distribute by的字段hash值和reduce个数取余,每个reduce内部可以使用sort by 排序 |
将查询到的数据导出到本地:
insert overwrite local directory '/opt/module/datas/info'
row format delimited fields terminated by '\t'
select * from tmp distribute by cost;
insert overwrite local directory '/opt/module/datas/info'
row format delimited fields terminated by '\t'
select * from tmp distribute by cost sort byyue;
insert overwrite local directory '/opt/module/datas/info'
row format delimited fields terminated by '\t'
select * from tmp distribute by cost sort bycost desc;
3、cluster by :当distribute by 和sort by的字段一样时,相当于cluster by,区别在于cluster by不能倒序排序
select * from tmp cluster by cost;
注意:区分cluster by 和clustered by
4、limit 取结果集的前几行
select * from tmp where cost > 40;
select * from tmp where yue < 10;
计算每个人的平均消费额:
select name,avg(cost),sum(cost) from tmp group by name;
select name,avg(cost)acost from tmp group by name where acost>30; ×
select name,avg(cost)acost from tmp group by name having acost>30;
with t1 as(select name,avg(cost)acost from tmp group by name)
select * from t1 where acost>30;
where和having:where是过滤原表的数据,having是过滤group by之后聚合函数的运算结果,where执行在group by之前,having执行在group by之后
select name,avg(cost) acost,sum(cost) as scost from tmp group by name;
------------------------------------------------------------------------------------------
join操作:
查询每个员工所在的部门
select emp.name name,dept.deptname dname
from emp join dept on emp.deptno=dept.deptno;
with t1 as(select emp.name name,dept.deptname dname from emp join dept on emp.deptno=dept.deptno)
select * from t1;
select e.name name,d.deptname dname
from emp e join dept d on e.deptno=d.deptno;
内连接:inner join :只保留能匹配上的数据(inner可省略)
外连接:outer join
left outer join / left join :左连接,保留左表(写在join左边的表)的所有数据,右表进行匹配,匹配不上的用null填充
select e.name name,d.deptname dname
from emp e left join dept d on e.deptno=d.deptno;
right outer join / right join :右连接,保留右表(写在join右边的表)的所有数据,左表进行匹配,匹配不上的用null填充
select e.name name,d.deptname dname
from emp e right join dept d on e.deptno=d.deptno;
full outer join / full join :满外连接,保留两个表的所有数据,其中一个表匹配不上的都用null填充
select e.name name,d.deptname dname
from emp e full join dept d on e.deptno=d.deptno;
查询每个员工负责的区域: