You are now on my old blog. Please update your bookmarks to my new blog
http://laurentschneider.com




14 December 2005

row generators performance

I wrote a few generators, and listed some existant in
http://laurentschneider.blogspot.com/2005/08/pivot-table.html

I decided to test them

note that this is not a "good" test, it is simply an overview. I am doing a count(*), another operation may be better in one or worst in another one.

all_objects is so slow (6 seconds for 50K rows) that I did not include it in my test

generate 1000 rows : all method are fast, except xquery...

generate 1000000 rows : model and cube never ends, xquery is very slow, connect and union are slow, plsql is ok, simple heap table is very fast (!)

I would like to point out that "generating rows" is rarely a business requirement. If you need to outer join with every day of the year, than create a table and insert every day of the year in it. It will be very fast, it is maintenable, extensible, supported, and self-documented...

Ok, here is my test

create type t_number as TABLE OF number;
/

create function f_number(n number) return t_number pipelined is
begin for i in 1..n loop pipe row(i); end loop; return; end;
/

create table t1000 as select 0 n from xmltable('for $i in 1 to 1000 return $i' );
create table t1000000 as select 0 n from xmltable('for $i in 1 to 1000000 return $i' );

set timi on feedb off echo off head off

prompt model 1000
select count(*)
from (
select null
from dual
model
dimension by (1 h)
measures (1 c)
rules( c[FOR h FROM 1 to 1000 INCREMENT 1] = 1)
)
/

prompt xquery 1000
select count(*)
from
xmltable('for $i in 1 to 1000 return $i' )
/

prompt xquery 1000000
select count(*)
from
xmltable('for $i in 1 to 1000000 return $i' )
/

prompt union 2*2*... 1K
with s as (select null from dual union all select null from dual)
select count(*)
from s,s,s,s,s,s,s,s,s,s
/

prompt union 1M
with s as (select null from dual union all select null from dual)
select count(*)
from
s,s,s,s,s,s,s,s,s,s,
s,s,s,s,s,s,s,s,s,s
/

prompt cube 1K
select count(*)
from (
select 1,2,3,4,5,6,7,8,9,10
from dual
group by cube(1,2,3,4,5,6,7,8,9,10)
)
/

prompt table 1000
select count(*)
from t1000
/

prompt table 1000000
select count(*)
from t1000000
/

set termout off
create type t_number as TABLE OF number;
/

create function f_number(n number) return t_number pipelined is
begin for i in 1..n loop pipe row(i); end loop; return; end;
/
set termout on

prompt plsql 1000
select count(*)
from table(f_number(1000))
/

prompt plsql 1000000
select count(*)
from table(f_number(1000000))
/

prompt all_objects
select count(*)
from all_objects
/

prompt connect 1000
select count(*)
from (
select level
from dual
connect by level<1001
)
/

prompt connect 1000000
select count(*)
from (
select level
from dual
connect by level<1000001
)
/



===============================================

model 1000
1000
Elapsed: 00:00:00.01
xquery 1000
1000
Elapsed: 00:00:00.30
xquery 1000000
1000000
Elapsed: 00:00:20.63
union 2*2*... 1K
1024
Elapsed: 00:00:00.02
union 1M
1048576
Elapsed: 00:00:01.32
cube 1K
1024
Elapsed: 00:00:00.00
table 1000
1000
Elapsed: 00:00:00.00
table 1000000
1000000
Elapsed: 00:00:00.21
plsql 1000
1000
Elapsed: 00:00:00.01
plsql 1000000
1000000
Elapsed: 00:00:00.91
all_objects
46709
Elapsed: 00:00:05.36
connect 1000
1000
Elapsed: 00:00:00.01
connect 1000000
1000000
Elapsed: 00:00:01.40

6 Comments:

Blogger Colin Sheppard said...

Elementary question:

What would you apply row generators for?

15/12/05 15:19  
Blogger Laurent Schneider said...

I have used them many times to answer user questions, but myself, I have never used them in my productive scripts as far as I can remember...
Thanks for your question ;-)

15/12/05 15:27  
Blogger Thomas Kyte said...

filling in the gaps!


ops$tkyte@ORA10GR2> select * from inv;

DT ITEM QTY
--------- ---------- ----------
01-JAN-05 item1 55
08-JAN-05 item1 20
15-JAN-05 item1 30
01-JAN-05 item2 155
04-JAN-05 item2 120

but we need to do time series analysis so we need to have every day filled in (and qty carried down..)



ops$tkyte@ORA10GR2> with dates as
2 (select to_date('01-jan-2005')+level-1 dt
3 from dual connect by level <= 15)
4 select dates.dt, inv.item,
5 last_value( inv.qty ignore nulls )
6 over (partition by inv.item order by dates.dt) qty
7 from dates left outer join inv partition by (item) on (dates.dt = inv.dt)
8 /

DT ITEM QTY
--------- ---------- ----------
01-JAN-05 item1 55
02-JAN-05 item1 55
03-JAN-05 item1 55
04-JAN-05 item1 55
05-JAN-05 item1 55
06-JAN-05 item1 55
07-JAN-05 item1 55
08-JAN-05 item1 20
09-JAN-05 item1 20
10-JAN-05 item1 20
11-JAN-05 item1 20
12-JAN-05 item1 20
13-JAN-05 item1 20
14-JAN-05 item1 20
15-JAN-05 item1 30
01-JAN-05 item2 155
02-JAN-05 item2 155
03-JAN-05 item2 155
04-JAN-05 item2 120
05-JAN-05 item2 120
06-JAN-05 item2 120
07-JAN-05 item2 120
08-JAN-05 item2 120
09-JAN-05 item2 120
10-JAN-05 item2 120
11-JAN-05 item2 120
12-JAN-05 item2 120
13-JAN-05 item2 120
14-JAN-05 item2 120
15-JAN-05 item2 120

30 rows selected.

15/12/05 17:17  
Blogger Laurent Schneider said...

thanks tom for your comment.
personnaly, I prefer

create table dates (dt date);
exec for i in 0..3000 loop insert into dates values(date '2000-01-01'+i); end loop

but this is a little bit Oracle6 dba style ;-)

style a row generator is very powerfull, but I use it more in my answers than in my scripts!

15/12/05 19:58  
Blogger Bob B said...

It is nice to join to a premade table, but in some cases, that becomes a maintenance nightmare. Suppose you want to let users run reports on the last 36 months of data. A 36 row table for each month works fine, but now you need to maintain the information in the table so that it always contains all available dates. Generating the rows, you only have to validate that the reporting dates are within a range of TRUNC( SYSDATE, 'MON' ).

19/12/05 16:16  
Blogger Laurent Schneider said...

however, with predefined tables, you can use materialized views, dimensions, hierarchies, partitioning, parallelism, etc...

the "dimension" table containing the months could help, but maybe you just need to having a look at "moving sum", a window clause of the analytics which let you defines relative values (range between 1080 preceding and current row)

have a look at the datawarehouse guide about the "time" dimension example.

19/12/05 20:35  

Post a Comment

<< Home