Re: Should Oracle outperform PostgreSQL on a complex

Mark Kirkwood <markir@xxxxxxxxxxxxxxx> · Sun, 18 Dec 2005 17:07:41 +1300

Tom Lane wrote:

2. transform joins into subselects, then return subselect rows via an
index bitmap. Joins are performed via a bitmap addition process.

Looks like 8.1 pretty much does this right now:

First the basic star:

EXPLAIN ANALYZE
SELECT
        d0.dmth,
        d1.dat,
        count(f.fval )
FROM
        dim0 AS d0,
        dim1 AS d1,
        fact0 AS f
WHERE   d0.d0key = f.d0key
AND     d1.d1key = f.d1key
AND     d0.dyr BETWEEN 2010 AND 2015
AND     d1.dattyp BETWEEN '10th measure type' AND '14th measure type'
GROUP BY
        d0.dmth,
        d1.dat
;

QUERY PLAN 

----------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=334842.41..334846.53 rows=329 width=37) (actual 
time=144317.960..144318.814 rows=120 loops=1)
   ->  Hash Join  (cost=145.72..334636.91 rows=27400 width=37) (actual 
time=1586.363..142831.025 rows=201600 loops=1)
         Hash Cond: ("outer".d0key = "inner".d0key)
         ->  Hash Join  (cost=89.72..333279.41 rows=137001 width=37) 
(actual time=1467.322..135585.317 rows=1000000 loops=1)
               Hash Cond: ("outer".d1key = "inner".d1key)
               ->  Seq Scan on fact0 f  (cost=0.00..281819.45 
rows=10000045 width=12) (actual time=120.881..70364.473 rows=10000000 
loops=1)
               ->  Hash  (cost=89.38..89.38 rows=137 width=33) (actual 
time=24.822..24.822 rows=660 loops=1)
                     ->  Index Scan using dim1_dattyp on dim1 d1 
(cost=0.00..89.38 rows=137 width=33) (actual time=0.502..19.374 rows=660 
loops=1)
                           Index Cond: (((dattyp)::text >= '10th 
measure type'::text) AND ((dattyp)::text <= '14th measure type'::text))
         ->  Hash  (cost=51.00..51.00 rows=2000 width=8) (actual 
time=31.620..31.620 rows=2016 loops=1)
               ->  Index Scan using dim0_dyr on dim0 d0 
(cost=0.00..51.00 rows=2000 width=8) (actual time=0.379..17.377 
rows=2016 loops=1)
                     Index Cond: ((dyr >= 2010) AND (dyr <= 2015))
 Total runtime: 144320.588 ms
(13 rows)

Now apply the star transformation:

EXPLAIN ANALYZE
SELECT
        d0.dmth,
        d1.dat,
        count(f.fval )
FROM
        dim0 AS d0,
        dim1 AS d1,
        fact0 AS f
WHERE   d0.d0key = f.d0key
AND     d1.d1key = f.d1key
AND     d0.dyr BETWEEN 2010 AND 2015
AND     d1.dattyp BETWEEN '10th measure type' AND '14th measure type'
AND     f.d0key IN (SELECT cd0.d0key FROM dim0 cd0
                    WHERE cd0.dyr BETWEEN 2010 AND 2015)
AND     f.d1key IN (SELECT cd1.d1key FROM dim1 cd1
                    WHERE cd1.dattyp BETWEEN '10th measure type'
                                     AND '14th measure type')
GROUP BY
        d0.dmth,
        d1.dat
;

       QUERY PLAN 

----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=129230.89..129231.83 rows=75 width=37) (actual 
time=39798.192..39799.015 rows=120 loops=1)
   ->  Nested Loop IN Join  (cost=149.44..129230.33 rows=75 width=37) 
(actual time=269.919..38125.520 rows=201600 loops=1)
         ->  Hash Join  (cost=147.43..128171.03 rows=375 width=45) 
(actual time=269.516..27342.866 rows=201600 loops=1)
               Hash Cond: ("outer".d0key = "inner".d0key)
               ->  Nested Loop  (cost=91.43..128096.03 rows=2000 
width=37) (actual time=152.084..19869.365 rows=1000000 loops=1)
                     ->  Hash Join  (cost=91.43..181.52 rows=2 
width=37) (actual time=29.931..46.339 rows=660 loops=1)
                           Hash Cond: ("outer".d1key = "inner".d1key)
                           ->  Index Scan using dim1_dattyp on dim1 d1 
 (cost=0.00..89.38 rows=137 width=33) (actual time=0.516..7.683 
rows=660 loops=1)
                                 Index Cond: (((dattyp)::text >= '10th 
measure type'::text) AND ((dattyp)::text <= '14th measure type'::text))
                           ->  Hash  (cost=91.09..91.09 rows=137 
width=4) (actual time=29.238..29.238 rows=660 loops=1)
                                 ->  HashAggregate  (cost=89.72..91.09 
rows=137 width=4) (actual time=20.940..24.900 rows=660 loops=1)
                                       ->  Index Scan using dim1_dattyp 
on dim1 cd1  (cost=0.00..89.38 rows=137 width=4) (actual 
time=0.042..14.841 rows=660 loops=1)
                                             Index Cond: 
(((dattyp)::text >= '10th measure type'::text) AND ((dattyp)::text <= 
'14th measure type'::text))
                     ->  Index Scan using fact0_d1key on fact0 f 
(cost=0.00..62707.26 rows=100000 width=12) (actual time=0.205..12.691 
rows=1515 loops=660)
                           Index Cond: ("outer".d1key = f.d1key)
               ->  Hash  (cost=51.00..51.00 rows=2000 width=8) (actual 
time=31.264..31.264 rows=2016 loops=1)
                     ->  Index Scan using dim0_dyr on dim0 d0 
(cost=0.00..51.00 rows=2000 width=8) (actual time=0.339..16.885 
rows=2016 loops=1)
                           Index Cond: ((dyr >= 2010) AND (dyr <= 2015))
         ->  Bitmap Heap Scan on dim0 cd0  (cost=2.00..2.81 rows=1 
width=4) (actual time=0.031..0.031 rows=1 loops=201600)
               Recheck Cond: ("outer".d0key = cd0.d0key)
               Filter: ((dyr >= 2010) AND (dyr <= 2015))
               ->  Bitmap Index Scan on dim0_d0key  (cost=0.00..2.00 
rows=1 width=0) (actual time=0.015..0.015 rows=1 loops=201600)
                     Index Cond: ("outer".d0key = cd0.d0key)
 Total runtime: 39800.294 ms
(24 rows)

The real run times are more like 24s and 9s, but you get the idea.

Cheers

Mark