Re: Subquery WHERE IN or WHERE EXISTS faster?

"Sergio Gabriel Rodriguez" <sgrodriguez@xxxxxxxxx> · Sat, 5 Jul 2008 09:14:05 -0300



Hi Ulrich, do you try with

SELECT p.speed FROM processor p
           INNER JOIN users_processors up ON p.id=up.processorid
           AND up.userid=1
?
Or your question is only about IN and EXIST?

regards,

Sergio Gabriel Rodriguez
Corrientes - Argentina
http://www.3trex.com.ar

On Sat, Jun 28, 2008 at 7:07 PM, Ulrich <ulrich.mierendorff@xxxxxxx> wrote:
> Hi,
> I have added a bit of dummy Data, 100000 processors, 10000 users, each user
> got around 12 processors.
>
> I have tested both queries. First of all, I was surprised that it is that
> fast :) Here are the results:
>
>
> EXPLAIN ANALYZE SELECT speed FROM processors WHERE id IN (SELECT processorid
> FROM users_processors WHERE userid=4040) ORDER BY speed ASC LIMIT 10 OFFSET
> 1;
>
> Limit  (cost=113.73..113.75 rows=7 width=5) (actual time=0.335..0.340
> rows=10 loops=1)
>  ->  Sort  (cost=113.73..113.75 rows=8 width=5) (actual time=0.332..0.333
> rows=11 loops=1)
>        Sort Key: processors.speed
>        Sort Method:  quicksort  Memory: 17kB
>        ->  Nested Loop  (cost=47.22..113.61 rows=8 width=5) (actual
> time=0.171..0.271 rows=13 loops=1)
>              ->  HashAggregate  (cost=47.22..47.30 rows=8 width=4) (actual
> time=0.148..0.154 rows=13 loops=1)
>                    ->  Bitmap Heap Scan on users_processors
>  (cost=4.36..47.19 rows=12 width=4) (actual time=0.074..0.117 rows=13
> loops=1)
>                          Recheck Cond: (userid = 4040)
>                          ->  Bitmap Index Scan on
> users_processors_userid_index  (cost=0.00..4.35 rows=12 width=0) (actual
> time=0.056..0.056 rows=13 loops=1)
>                                Index Cond: (userid = 4040)
>              ->  Index Scan using processors_pkey on processors
>  (cost=0.00..8.28 rows=1 width=9) (actual time=0.006..0.007 rows=1 loops=13)
>                    Index Cond: (processors.id =
> users_processors.processorid)
> Total runtime: 0.471 ms
> (13 rows)
>
> ___________
>
> EXPLAIN ANALYZE SELECT speed FROM processors WHERE EXISTS (SELECT 1 FROM
> users_processors WHERE userid=4040 AND processorid=processors.id) ORDER BY
> speed ASC LIMIT 10 OFFSET 1;
>
> Limit  (cost=831413.86..831413.89 rows=10 width=5) (actual
> time=762.475..762.482 rows=10 loops=1)
>  ->  Sort  (cost=831413.86..831538.86 rows=50000 width=5) (actual
> time=762.471..762.473 rows=11 loops=1)
>        Sort Key: processors.speed
>        Sort Method:  quicksort  Memory: 17kB
>        ->  Seq Scan on processors  (cost=0.00..830299.00 rows=50000 width=5)
> (actual time=313.591..762.411 rows=13 loops=1)
>              Filter: (subplan)
>              SubPlan
>                ->  Index Scan using users_processors_pkey on
> users_processors  (cost=0.00..8.29 rows=1 width=0) (actual time=0.006..0.006
> rows=0 loops=100000)
>                      Index Cond: ((userid = 4040) AND (processorid = $0))
> Total runtime: 762.579 ms
> (10 rows)
>
>
>
>
> As you can see the second query is much slower. First I thought "Just a
> difference of 0.3ms?", but then I realized that it was 762ms not 0.762 ;-).
> Both queries return the same result, so I will use #1 and count(*) takes
> just 0.478ms if I use query #1.
>
> Kind Regards,
> Ulrich
>
> Tom Lane wrote:
>>
>> Ulrich <ulrich.mierendorff@xxxxxxx> writes:
>>
>>>
>>> People say that [EXISTS is faster]
>>>
>>
>> People who say that are not reliable authorities, at least as far as
>> Postgres is concerned.  But it is always a bad idea to extrapolate
>> results on toy tables to large tables --- quite aside from measurement
>> noise and caching issues, the planner might pick a different plan when
>> faced with large tables.  Load up a realistic amount of data and then
>> see what you get.
>>
>>                        regards, tom lane
>>
>>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>