KEEP DENSE_RANK versus ROW_NUMBER()
I often see questions like
How do you get the row of each department with the highest salary
In case you only want 1 row, you have two modern solutions :
Analytics, which is trend, and KEEP, which is not very known
The old fashion would be something like where s in (select max())
Ok, let's start with analytics
and the KEEP method, which is a special aggregation
the second one should be more performant
How do you get the row of each department with the highest salary
In case you only want 1 row, you have two modern solutions :
Analytics, which is trend, and KEEP, which is not very known
The old fashion would be something like where s in (select max())
Ok, let's start with analytics
SQL> select ename,deptno,sal
2 from (select ename,deptno,sal,
3 row_number() over (partition by deptno
4 order by sal desc,empno) r from emp)
5 where r=1;
ENAME DEPTNO SAL
---------- ---------- ----------
KING 10 5000
SCOTT 20 3000
BLAKE 30 2850
and the KEEP method, which is a special aggregation
SQL> select max(ename) keep (dense_rank first
2 order by sal desc,empno) ename,
3 deptno,max(sal) sal
4 from emp group by deptno;
ENAME DEPTNO SAL
---------- ---------- ----------
KING 10 5000
SCOTT 20 3000
BLAKE 30 2850
the second one should be more performant
SQL> select count(*) from emp2;
COUNT(*)
----------
917504
SQL> select max(ename) keep (dense_rank first
2 order by sal desc,empno) ename,
3 deptno,max(sal) sal
4 from emp group by deptno;
ENAME DEPTNO SAL
---------- ---------- ----------
KING 10 5000
SCOTT 20 3000
BLAKE 30 2850
Elapsed: 00:00:01.00
SQL> select ename,deptno,sal
2 from (select ename,deptno,sal,
3 row_number() over (partition by deptno
4 order by sal desc,empno) r from emp)
5 where r=1;
ENAME DEPTNO SAL
---------- ---------- ----------
KING 10 5000
SCOTT 20 3000
BLAKE 30 2850
Elapsed: 00:00:01.43
8 Comments:
That's why your post here
;-)
Can it be possible to use KEEP to find out duplicates that have more than one key?
I found duplicates with two methods and tried to find out with KEEP. My work is below:
Connected to Oracle Database 10g Enterprise Edition Release 10.2.0.1.0
Connected as HR
SQL>
SQL> drop table dup;
Table dropped
SQL> create table dup(i number, x varchar2(8), d date );
Table created
SQL> BEGIN
2 INSERT INTO dup VALUES (1, 'X', trunc(SYSDATE + 5));
3 INSERT INTO dup VALUES (1, 'X', trunc(SYSDATE));
4 INSERT INTO dup VALUES (2, 'X', trunc(SYSDATE));
5 INSERT INTO dup VALUES (2, 'C', trunc(SYSDATE - 1));
6 INSERT INTO dup VALUES (2, 'Y', trunc(SYSDATE - 3));
7 INSERT INTO dup VALUES (2, 'Y', trunc(SYSDATE + 3));
8 INSERT INTO dup VALUES (1, 'Z', trunc(SYSDATE));
9 INSERT INTO dup VALUES (1, 'X', trunc(SYSDATE + 1));
10 INSERT INTO dup VALUES (1, 'X', trunc(SYSDATE));
11 INSERT INTO dup VALUES (4, 'Z', trunc(SYSDATE + 1));
12 INSERT INTO dup VALUES (4, 'Z', trunc(SYSDATE + 1));
13 INSERT INTO dup VALUES (4, 'Z', trunc(SYSDATE + 2));
14 INSERT INTO dup VALUES (4, 'Z', trunc(SYSDATE + 1));
15 END;
16 /
PL/SQL procedure successfully completed
SQL> commit;
Commit complete
SQL> SELECT * FROM dup;
I X D
---------- -------- -----------
1 X 30.09.2006
1 X 25.09.2006
2 X 25.09.2006
2 C 24.09.2006
2 Y 22.09.2006
2 Y 28.09.2006
1 Z 25.09.2006
1 X 26.09.2006
1 X 25.09.2006
4 Z 26.09.2006
4 Z 26.09.2006
4 Z 27.09.2006
4 Z 26.09.2006
13 rows selected
SQL> --
SQL> SELECT i, x, d, COUNT(*)
2 FROM dup
3 GROUP BY i, x, d
4 HAVING COUNT(*) > 1;
I X D COUNT(*)
---------- -------- ----------- ----------
4 Z 26.09.2006 3
1 X 25.09.2006 2
SQL> --
SQL> SELECT dd.i, dd.x, dd.d
2 FROM (SELECT i,
3 x,
4 d,
5 ROWID,
6 row_number() over(PARTITION BY i, x, d ORDER BY ROWID) rn
7 FROM dup) dd
8 WHERE dd.rn != 1;
I X D
---------- -------- -----------
1 X 25.09.2006
4 Z 26.09.2006
4 Z 26.09.2006
SQL> --
SQL> SELECT *
2 FROM dup a
3 WHERE a.ROWID NOT IN (SELECT MAX(ROWID)
4 FROM dup b
5 WHERE a.i = b.i
6 AND a.x = b.x
7 AND a.d = b.d);
I X D
---------- -------- -----------
1 X 25.09.2006
4 Z 26.09.2006
4 Z 26.09.2006
SQL>
well, group by will suppress rows, so you will not get it without subquery.
evtl
select i,x,d,rowid from dup minus
select i,x,d,max(rowid) keep (dense_rank first order by rownum) from dup group by i,x,d
I X D ROWID
---------- -------- --------- ------------------
1 X 25-SEP-06 AAAE5oAAEAAAAA6AAI
4 Z 26-SEP-06 AAAE5oAAEAAAAA6AAK
4 Z 26-SEP-06 AAAE5oAAEAAAAA6AAM
but actually the keep is not needed!
and rownumber is better
in case you cannot use rowid, I could imagine using rownum like that :
select i,x,d,rownum from dup minus
select i,x,d,max(rownum) from dup group by i,x,d;
I X D ROWNUM
---------- -------- --------- ----------
1 X 25-SEP-06 2
4 Z 26-SEP-06 10
4 Z 26-SEP-06 11
Yes you are right keep is not needed. As i see without subquery it is not possible. I will do a simple performance test will post the results as soon as possible.
I could not undersstand why you said "in case you cannot use rowid". Using rowid is giving is exact solution, isn't it?
Thanks for replies...
Regards, Mennan
if you want to select from a view, or a subquery, you may want to use rownum
SQL> with t as (select 1 x from dual union all select 2 from dual union all select 2 from dual)
2 select x, rownum from t minus select x,max(rownum) from t group by x;
X ROWNUM
---------- ----------
2 2
SQL> with t as (select 1 x from dual union all select 2 from dual union all select 2 from dual)
2 select x, rowid from t minus select x,max(rowid) from t group by x;
select x, rowid from t minus select x,max(rowid) from t group by x
*
ERROR at line 2:
ORA-01446: cannot select ROWID from, or sample, a view with DISTINCT, GROUP BY, etc.
about performance, the minus + max will be terrible, I do not want to see it ;-)
go for rownumber+rowid
Thanks, i got it
When I look at Explain Plan this doesn't do what I expect and does not look as efficient as I expect.
SQL> explain plan for select min(component_id) keep (dense_rank first order by component_id nulls last) waof
2 , max(component_id) keep (dense_rank first order by component_id nulls last) waof2
3 from i_component
4 /
已解释。
SQL> @C:\oraclexe\app\oracle\product\10.2.0\server\RDBMS\ADMIN\utlxpls.sql
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1142152098
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 | 1 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 4 | | |
| 2 | INDEX FULL SCAN| I_COMPONENT_PK | 391 | 1564 | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------
已选择9行。
Why doesn't it just get the first row from the primary key rather than doing the Index Full Scan?
Post a Comment
<< Home