You are now on my old blog. Please update your bookmarks to my new blog
http://laurentschneider.com




18 September 2006

KEEP DENSE_RANK versus ROW_NUMBER()

I often see questions like

How do you get the row of each department with the highest salary

In case you only want 1 row, you have two modern solutions :
Analytics, which is trend, and KEEP, which is not very known

The old fashion would be something like where s in (select max())

Ok, let's start with analytics


SQL> select ename,deptno,sal
2 from (select ename,deptno,sal,
3 row_number() over (partition by deptno
4 order by sal desc,empno) r from emp)
5 where r=1;
ENAME DEPTNO SAL
---------- ---------- ----------
KING 10 5000
SCOTT 20 3000
BLAKE 30 2850


and the KEEP method, which is a special aggregation


SQL> select max(ename) keep (dense_rank first
2 order by sal desc,empno) ename,
3 deptno,max(sal) sal
4 from emp group by deptno;
ENAME DEPTNO SAL
---------- ---------- ----------
KING 10 5000
SCOTT 20 3000
BLAKE 30 2850



the second one should be more performant


SQL> select count(*) from emp2;
COUNT(*)
----------
917504

SQL> select max(ename) keep (dense_rank first
2 order by sal desc,empno) ename,
3 deptno,max(sal) sal
4 from emp group by deptno;
ENAME DEPTNO SAL
---------- ---------- ----------
KING 10 5000
SCOTT 20 3000
BLAKE 30 2850

Elapsed: 00:00:01.00
SQL> select ename,deptno,sal
2 from (select ename,deptno,sal,
3 row_number() over (partition by deptno
4 order by sal desc,empno) r from emp)
5 where r=1;
ENAME DEPTNO SAL
---------- ---------- ----------
KING 10 5000
SCOTT 20 3000
BLAKE 30 2850

Elapsed: 00:00:01.43

8 Comments:

Blogger Nicolas Gasparotto said...

That's why your post here

;-)

18/9/06 23:06  
Blogger Mennan said...

Can it be possible to use KEEP to find out duplicates that have more than one key?
I found duplicates with two methods and tried to find out with KEEP. My work is below:

Connected to Oracle Database 10g Enterprise Edition Release 10.2.0.1.0
Connected as HR


SQL>
SQL> drop table dup;

Table dropped

SQL> create table dup(i number, x varchar2(8), d date );

Table created

SQL> BEGIN
2 INSERT INTO dup VALUES (1, 'X', trunc(SYSDATE + 5));
3 INSERT INTO dup VALUES (1, 'X', trunc(SYSDATE));
4 INSERT INTO dup VALUES (2, 'X', trunc(SYSDATE));
5 INSERT INTO dup VALUES (2, 'C', trunc(SYSDATE - 1));
6 INSERT INTO dup VALUES (2, 'Y', trunc(SYSDATE - 3));
7 INSERT INTO dup VALUES (2, 'Y', trunc(SYSDATE + 3));
8 INSERT INTO dup VALUES (1, 'Z', trunc(SYSDATE));
9 INSERT INTO dup VALUES (1, 'X', trunc(SYSDATE + 1));
10 INSERT INTO dup VALUES (1, 'X', trunc(SYSDATE));
11 INSERT INTO dup VALUES (4, 'Z', trunc(SYSDATE + 1));
12 INSERT INTO dup VALUES (4, 'Z', trunc(SYSDATE + 1));
13 INSERT INTO dup VALUES (4, 'Z', trunc(SYSDATE + 2));
14 INSERT INTO dup VALUES (4, 'Z', trunc(SYSDATE + 1));
15 END;
16 /

PL/SQL procedure successfully completed

SQL> commit;

Commit complete

SQL> SELECT * FROM dup;

I X D
---------- -------- -----------
1 X 30.09.2006
1 X 25.09.2006
2 X 25.09.2006
2 C 24.09.2006
2 Y 22.09.2006
2 Y 28.09.2006
1 Z 25.09.2006
1 X 26.09.2006
1 X 25.09.2006
4 Z 26.09.2006
4 Z 26.09.2006
4 Z 27.09.2006
4 Z 26.09.2006

13 rows selected

SQL> --
SQL> SELECT i, x, d, COUNT(*)
2 FROM dup
3 GROUP BY i, x, d
4 HAVING COUNT(*) > 1;

I X D COUNT(*)
---------- -------- ----------- ----------
4 Z 26.09.2006 3
1 X 25.09.2006 2

SQL> --
SQL> SELECT dd.i, dd.x, dd.d
2 FROM (SELECT i,
3 x,
4 d,
5 ROWID,
6 row_number() over(PARTITION BY i, x, d ORDER BY ROWID) rn
7 FROM dup) dd
8 WHERE dd.rn != 1;

I X D
---------- -------- -----------
1 X 25.09.2006
4 Z 26.09.2006
4 Z 26.09.2006

SQL> --
SQL> SELECT *
2 FROM dup a
3 WHERE a.ROWID NOT IN (SELECT MAX(ROWID)
4 FROM dup b
5 WHERE a.i = b.i
6 AND a.x = b.x
7 AND a.d = b.d);

I X D
---------- -------- -----------
1 X 25.09.2006
4 Z 26.09.2006
4 Z 26.09.2006

SQL>

25/9/06 16:08  
Blogger Laurent Schneider said...

well, group by will suppress rows, so you will not get it without subquery.

evtl
select i,x,d,rowid from dup minus
select i,x,d,max(rowid) keep (dense_rank first order by rownum) from dup group by i,x,d

I X D ROWID
---------- -------- --------- ------------------
1 X 25-SEP-06 AAAE5oAAEAAAAA6AAI
4 Z 26-SEP-06 AAAE5oAAEAAAAA6AAK
4 Z 26-SEP-06 AAAE5oAAEAAAAA6AAM


but actually the keep is not needed!

and rownumber is better

25/9/06 16:56  
Blogger Laurent Schneider said...

in case you cannot use rowid, I could imagine using rownum like that :

select i,x,d,rownum from dup minus
select i,x,d,max(rownum) from dup group by i,x,d;

I X D ROWNUM
---------- -------- --------- ----------
1 X 25-SEP-06 2
4 Z 26-SEP-06 10
4 Z 26-SEP-06 11

25/9/06 17:07  
Blogger Mennan said...

Yes you are right keep is not needed. As i see without subquery it is not possible. I will do a simple performance test will post the results as soon as possible.

I could not undersstand why you said "in case you cannot use rowid". Using rowid is giving is exact solution, isn't it?

Thanks for replies...
Regards, Mennan

26/9/06 08:28  
Blogger Laurent Schneider said...

if you want to select from a view, or a subquery, you may want to use rownum

SQL> with t as (select 1 x from dual union all select 2 from dual union all select 2 from dual)
2 select x, rownum from t minus select x,max(rownum) from t group by x;
X ROWNUM
---------- ----------
2 2

SQL> with t as (select 1 x from dual union all select 2 from dual union all select 2 from dual)
2 select x, rowid from t minus select x,max(rowid) from t group by x;
select x, rowid from t minus select x,max(rowid) from t group by x
*
ERROR at line 2:
ORA-01446: cannot select ROWID from, or sample, a view with DISTINCT, GROUP BY, etc.


about performance, the minus + max will be terrible, I do not want to see it ;-)

go for rownumber+rowid

26/9/06 10:17  
Blogger Mennan said...

Thanks, i got it

26/9/06 10:33  
Anonymous Anonymous said...

When I look at Explain Plan this doesn't do what I expect and does not look as efficient as I expect.


SQL> explain plan for select min(component_id) keep (dense_rank first order by component_id nulls last) waof
2 , max(component_id) keep (dense_rank first order by component_id nulls last) waof2
3 from i_component
4 /

已解释。

SQL> @C:\oraclexe\app\oracle\product\10.2.0\server\RDBMS\ADMIN\utlxpls.sql

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1142152098

-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 1 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | INDEX FULL SCAN| I_COMPONENT_PK | 391 | 1564 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------

已选择9行。


Why doesn't it just get the first row from the primary key rather than doing the Index Full Scan?

6/10/06 08:02  

Post a Comment

<< Home