### Oracle analytics in basic sql queries

When I first saw analytics appearing in Oracle last century, I did not realised they were going to change my way of writting basic SQL queries.

Some (Variance, deviance) are truely mathematical and still reserved for statistical analysis.

Here I will try to describe ROW_NUMBER :

Back in Oracle 7, I remember to have written a lot of reports using

select deptno,ename,sal from emp where (deptno,sal) in (select deptno,max(sal) from emp group by deptno);

With analytics, I can rewrite it with

select deptno,ename,sal from (select emp.*, rank() over (partition by deptno order by sal desc) r from emp) where r=1;

If I want to get exactly one row per deptno, I could then write something like

select deptno,ename,sal from (select emp.*, row_number() over (partition by deptno order by sal desc) r from emp) where r=1;

or better

select deptno,ename,sal from (select emp.*, row_number() over (partition by deptno order by sal desc, empno) r from emp) where r=1;

row_number will select only one row. I prefer the second, because empno is a primary key and the result will be constant over time. In the first solution, I cannot determine which row will be returned, and Oracle may choose one on the morning, and another one in the afternoon, depending on the execution plan (new index/new stats/different load/...).

Note that I can also select the 2nd biggest salary (r=2), or the top 5 (r<6)

It is also very performant, because you are doing only one full table scan instead of two

Some (Variance, deviance) are truely mathematical and still reserved for statistical analysis.

Here I will try to describe ROW_NUMBER :

Back in Oracle 7, I remember to have written a lot of reports using

*in max*subqueries, like inselect deptno,ename,sal from emp where (deptno,sal) in (select deptno,max(sal) from emp group by deptno);

With analytics, I can rewrite it with

select deptno,ename,sal from (select emp.*, rank() over (partition by deptno order by sal desc) r from emp) where r=1;

If I want to get exactly one row per deptno, I could then write something like

select deptno,ename,sal from (select emp.*, row_number() over (partition by deptno order by sal desc) r from emp) where r=1;

or better

select deptno,ename,sal from (select emp.*, row_number() over (partition by deptno order by sal desc, empno) r from emp) where r=1;

row_number will select only one row. I prefer the second, because empno is a primary key and the result will be constant over time. In the first solution, I cannot determine which row will be returned, and Oracle may choose one on the morning, and another one in the afternoon, depending on the execution plan (new index/new stats/different load/...).

Note that I can also select the 2nd biggest salary (r=2), or the top 5 (r<6)

It is also very performant, because you are doing only one full table scan instead of two

## 1 Comments:

Just wanted to say thanks for the post. I discovered Oracle's analytical functions only about a year ago and have been using them like crazy ever since. I hadn't thought of using the row fuction that you demonstrated though. Good work.

JW, Oklahoma City, Oklahoma

Post a Comment

<< Home