Re: a JOIN on same table, but 'slided over'

Rafal Pietrak <rafal@xxxxxxxxxxxxxxxxxx> · Tue, 26 Jun 2007 23:11:33 +0200

I see. (Have actually tried it on a larger dataset - to see it for
myself ... it is optimised :)

Thenx again!

-R

On Tue, 2007-06-26 at 19:56 +0530, Gurjeet Singh wrote:
>     It _is_ the optimised version.... as you can see from the explain
> plans posted in the other mail, the planner shows that the cost is
> drastically less than the 'distinct on' version.
> 
>     For smaller data-sets 'distinct-on' version might seem faster, but
> for reasonably larger datasets, it's performance deteriorates
> exponentially... This is because of the Nested-loops involved in the
> plan...
> 
>     I increased your data-set to 10240 rows by executing the following
> query 10 times:
> 
> insert into test select id+(select max(id) from test), thread, info
> from test;
> 
>     On such data-set (which is not very large by any means), the
> standard SQL version executes in almost a second, and on the other
> hand, I had to cancel the EXPLAIN ANALYZE of the 'distinct on' query
> after letting it run for over three minutes!!!
> 
> postgres=# explain analyze
> postgres-# select       t1.id as id, t2.id as "id+1",
> postgres-#              t1.thread as thread, t2.thread as "thread+1",
> postgres-#              t1.info as info, t2.info as "info+1"
> postgres-# from test as t1, test as t2
> postgres-# where t2.id = ( select min(id) from test as t3 where t3.id
> > t1.id )
> postgres-# order by t1.id asc;
> 
> QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
>  Sort  (cost=2971.36..2996.96 rows=10240 width=24) (actual
> time=1004.031..1030.116 rows=10239 loops=1)
>    Sort Key: t1.id
>    Sort Method:  external sort  Disk: 416kB
>    ->  Merge Join  (cost=840.48..2289.28 rows=10240 width=24) (actual
> time=834.218..956.595 rows=10239 loops=1) 
>          Merge Cond: (t2.id = ((subplan)))
>          ->  Index Scan using test_id_key on test t2
> (cost=0.00..332.85 rows=10240 width=12) (actual time=0.060..24.503
> rows=10240 loops=1)
>          ->  Sort  (cost=840.48..866.08 rows=10240 width=12) (actual
> time=834.129..854.776 rows=10240 loops=1)
>                Sort Key: ((subplan))
>                Sort Method:  quicksort  Memory: 928kB
>                ->  Seq Scan on test t1  (cost=0.00..158.40 rows=10240
> width=12)(actual time=0.196..797.752 rows=10240 loops=1)
>                      SubPlan
>                        ->  Result  (cost= 0.04..0.05 rows=1 width=0)
> (actual time=0.062..0.064 rows=1 loops=10240)
>                              InitPlan 
>                                ->  Limit  (cost=0.00..0.04 rows=1
> width=4) (actual time=0.047..0.050 rows=1 loops=10240)
>                                      ->  Index Scan using test_id_key
> on test t3  (cost=0.00..121.98 rows=3413 width=4) (actual time=
> 0.038..0.038 rows=1 loops=10240)
>                                            Index Cond: (id > $0)
>                                            Filter: (id IS NOT NULL)
>  Total runtime: 1052.802 ms
> (18 rows)
> Time: 1056.740 ms
> 
> postgres=# explain analyze
> postgres-# select
> postgres-#     distinct on (t1.id)
> postgres-#     t1.*, t2.*
> postgres-# from
> postgres-#     test t1
> postgres-#     join test t2 on t2.id > t1.id
> postgres-# order by t1.id asc, t2.id asc;
> Cancel request sent
> ERROR:  canceling statement due to user request 
> postgres=#
> 
> 
> 
> On 6/26/07, Rafal Pietrak <rafal@xxxxxxxxxxxxxxxxxx> wrote:
>         OK. Have tried this one.... looks like close to 6 times slower
>         then the
>         'non-standard' phrase with 'distinct on'.
>         
>         On the small dataset that I've included in my original post
>         (ten rows of
>         data within TEST), I've run both queries through EXPLAIN
>         ANALYSE, with 
>         the following result summary (for clearity, I've cut away the
>         details
>         from EXPLAIN output):
>         
>         -----------STANDARD
>         Total runtime: 10.660 ms
>         -----------DISTINCT-ON
>         Total runtime: 1.479 ms
>         ----------- 
>         
>         Would there be ways to optimise the standard query to get the
>         performance closer to the none-standard one?
>         
>         
>         -R
>         
>         
>         On Tue, 2007-06-26 at 18:05 +0530, Gurjeet Singh wrote:
>         > Hi Rafal,
>         > 
>         >     Just a note that this is not standard SQL... 'distinct
>         on' is an
>         > extension to SQL provided by postgres.
>         >
>         > Following query utilizes the standard SQL to get the same
>         results:
>         > 
>         > select    t1.id as id, t2.id as "id+1",
>         >         t1.thread as thread, t2.thread as "thread+1",
>         >         t1.info as info, t2.info as "info+1"
>         > from test as t1, test as t2
>         > where t2.id = ( select min(id) from test as t3 where t3.id >
>         t1.id);
>         >
>         > HTH
>         > --
>         > gurjeet[.singh]@EnterpriseDB.com
>         > singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com
>         >
>         > 17°29'34.37"N  78°30' 59.76"E - Hyderabad *
>         > 18°32'57.25"N  73°56'25.42 "E - Pune
>         >
>         > Sent from my BlackLaptop device
>         >
>         > On 6/26/07, Rafal Pietrak <rafal@xxxxxxxxxxxxxxxxxx> wrote:
>         >         Marvelous! Thenx!
>         >
>         >         -R
>         >
>         >         On Tue, 2007-06-26 at 10:06 +0200, hubert depesz
>         lubaczewski
>         >         wrote:
>         >         > On 6/26/07, Rafal Pietrak <
>         rafal@xxxxxxxxxxxxxxxxxx> wrote:
>         >         >         Is there an SQL construct to get it?
>         >         >
>         >         > select
>         >         >     distinct on ( t1.id)
>         >         >     t1.*, t2.*
>         >         > from
>         >         >     test t1
>         >         >     join test t2 on t2.id > t1.id
>         >         > order by t1.id asc, t2.id asc
>         >         >
>         >         > should do the trick.
>         >         >
>         >         > depesz 
>         >         >
>         >         > --
>         >         > http://www.depesz.com/ - nowy, lepszy depesz
>         >
>         >         ---------------------------(end of
>         >         broadcast)--------------------------- 
>         >         TIP 4: Have you searched our list archives?
>         >
>         >                        http://archives.postgresql.org/
>         >
> 
> 
> 
> -- 
> gurjeet[.singh]@EnterpriseDB.com
> singh.gurjeet@{ gmail | hotmail | indiatimes | yahoo }.com
> 
> 17°29'34.37"N  78°30'59.76"E - Hyderabad *
> 18°32'57.25"N  73°56'25.42"E - Pune 
> 
> Sent from my BlackLaptop device