Oracle analytics in basic sql queries
When I first saw analytics appearing in Oracle last century, I did not realised they were going to change my way of writting basic SQL queries.
Some (Variance, deviance) are truely mathematical and still reserved for statistical analysis.
Here I will try to describe ROW_NUMBER :
Back in Oracle 7, I remember to have written a lot of reports using in max subqueries, like in
select deptno,ename,sal from emp where (deptno,sal) in (select deptno,max(sal) from emp group by deptno);
With analytics, I can rewrite it with
select deptno,ename,sal from (select emp.*, rank() over (partition by deptno order by sal desc) r from emp) where r=1;
If I want to get exactly one row per deptno, I could then write something like
select deptno,ename,sal from (select emp.*, row_number() over (partition by deptno order by sal desc) r from emp) where r=1;
or better
select deptno,ename,sal from (select emp.*, row_number() over (partition by deptno order by sal desc, empno) r from emp) where r=1;
row_number will select only one row. I prefer the second, because empno is a primary key and the result will be constant over time. In the first solution, I cannot determine which row will be returned, and Oracle may choose one on the morning, and another one in the afternoon, depending on the execution plan (new index/new stats/different load/...).
Note that I can also select the 2nd biggest salary (r=2), or the top 5 (r<6)
It is also very performant, because you are doing only one full table scan instead of two
Some (Variance, deviance) are truely mathematical and still reserved for statistical analysis.
Here I will try to describe ROW_NUMBER :
Back in Oracle 7, I remember to have written a lot of reports using in max subqueries, like in
select deptno,ename,sal from emp where (deptno,sal) in (select deptno,max(sal) from emp group by deptno);
With analytics, I can rewrite it with
select deptno,ename,sal from (select emp.*, rank() over (partition by deptno order by sal desc) r from emp) where r=1;
If I want to get exactly one row per deptno, I could then write something like
select deptno,ename,sal from (select emp.*, row_number() over (partition by deptno order by sal desc) r from emp) where r=1;
or better
select deptno,ename,sal from (select emp.*, row_number() over (partition by deptno order by sal desc, empno) r from emp) where r=1;
row_number will select only one row. I prefer the second, because empno is a primary key and the result will be constant over time. In the first solution, I cannot determine which row will be returned, and Oracle may choose one on the morning, and another one in the afternoon, depending on the execution plan (new index/new stats/different load/...).
Note that I can also select the 2nd biggest salary (r=2), or the top 5 (r<6)
It is also very performant, because you are doing only one full table scan instead of two
1 Comments:
Just wanted to say thanks for the post. I discovered Oracle's analytical functions only about a year ago and have been using them like crazy ever since. I hadn't thought of using the row fuction that you demonstrated though. Good work.
JW, Oklahoma City, Oklahoma
Post a Comment
<< Home