Re: Select in subselect vs select = any array

Adam Tistler <atistler@xxxxxxxxx> · Sun, 20 Mar 2011 23:20:56 -0400

logicops2=# explain analyze select count(*) from nodes where node_id = any(  Array(select node_id from nodes limit 100000) );
                                                               QUERY PLAN                                                                
-----------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1718.59..1718.60 rows=1 width=0) (actual time=509.126..509.127 rows=1 loops=1)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.00..1637.04 rows=100000 width=4) (actual time=0.010..76.604 rows=100000 loops=1)
           ->  Seq Scan on nodes  (cost=0.00..12355.41 rows=754741 width=4) (actual time=0.008..38.105 rows=100000 loops=1)
   ->  Bitmap Heap Scan on nodes  (cost=42.67..81.53 rows=10 width=0) (actual time=447.274..484.283 rows=100000 loops=1)
         Recheck Cond: (node_id = ANY ($0))
         ->  Bitmap Index Scan on n_node_id_index  (cost=0.00..42.67 rows=10 width=0) (actual time=447.074..447.074 rows=100000 loops=1)
               Index Cond: (node_id = ANY ($0))
 Total runtime: 509.209 ms
(9 rows)

Time: 510.009 ms

logicops2=# explain analyze select count(*) from nodes where node_id in (select node_id from nodes limit 100000);
                                                               QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=3017.17..3017.18 rows=1 width=0) (actual time=1052.866..1052.866 rows=1 loops=1)
   ->  Nested Loop  (cost=2887.04..3016.67 rows=200 width=0) (actual time=167.310..1021.540 rows=100000 loops=1)
         ->  HashAggregate  (cost=2887.04..2889.04 rows=200 width=4) (actual time=167.198..251.205 rows=100000 loops=1)
               ->  Limit  (cost=0.00..1637.04 rows=100000 width=4) (actual time=0.008..80.090 rows=100000 loops=1)
                     ->  Seq Scan on nodes  (cost=0.00..12355.41 rows=754741 width=4) (actual time=0.007..41.566 rows=100000 loops=1)
         ->  Index Scan using n_node_id_index on nodes  (cost=0.00..0.63 rows=1 width=4) (actual time=0.006..0.007 rows=1 loops=100000)
               Index Cond: (public.nodes.node_id = public.nodes.node_id)
 Total runtime: 1053.523 ms
(8 rows)

Time: 1054.864 ms

On Mar 20, 2011, at 2:51 AM, Pavel Stehule wrote:

> Hello
> 
> 2011/3/20 Adam Tistler <atistler@xxxxxxxxx>:
>> I have noticed that SELECT ... = ANY(ARRAY(...))  is about twice as fast as SELECT IN ( ... ).
>> Can anyone explain a reason for this?  Results are the bottom and are reproducible.  I can test with other versions if that is necessary.
>> 
> 
> send a result of EXPLAIN ANALYZE SELECT ..., please
> 
> The reasons can be different - less seq scans, indexes
> 
> Regards
> 
> Pavel Stehule
> 
> 
> 
>> ./configure --prefix=/usr/local/pgsql84 --with-openssl --with-perl
>> CentOS release 5.4 (Final)
>> psql (PostgreSQL) 8.4.1
>> 
>> prompt2=# select count(*) from nodes;
>>  count
>> --------
>>  754734
>> (1 row)
>> 
>> 
>> prompt2=# \d nodes
>>                                        Table "public.nodes"
>>    Column    |           Type           |                         Modifiers
>> --------------+--------------------------+-----------------------------------------------------------
>>  node_id      | integer                  | not null default nextval(('node_id_seq'::text)::regclass)
>>  node_type_id | integer                  | not null
>>  template_id  | integer                  | not null
>>  timestamp    | timestamp with time zone | default ('now'::text)::timestamp(6) with time zone
>> Indexes:
>>    "nodes_pkey" PRIMARY KEY, btree (node_id)
>>    "n_node_id_index" btree (node_id)
>>    "n_node_type_id_index" btree (node_type_id)
>>    "n_template_id_index" btree (template_id)
>> 
>> prompt2=# select count(*) from nodes where node_id = any(  Array(select node_id from nodes limit 100000) );
>>  count
>> --------
>>  100000
>> (1 row)
>> 
>> Time: 404.530 ms
>> prompt2=# select count(*) from nodes where node_id = any(  Array(select node_id from nodes limit 100000) );
>>  count
>> --------
>>  100000
>> (1 row)
>> 
>> Time: 407.316 ms
>> prompt2=# select count(*) from nodes where node_id = any(  Array(select node_id from nodes limit 100000) );
>>  count
>> --------
>>  100000
>> (1 row)
>> 
>> Time: 408.728 ms
>> prompt2=# select count(*) from nodes where node_id in (select node_id from nodes limit 100000 );
>>  count
>> --------
>>  100000
>> (1 row)
>> 
>> Time: 793.840 ms
>> prompt2=# select count(*) from nodes where node_id in (select node_id from nodes limit 100000 );
>>  count
>> --------
>>  100000
>> (1 row)
>> 
>> Time: 779.137 ms
>> prompt2=# select count(*) from nodes where node_id in (select node_id from nodes limit 100000 );
>>  count
>> --------
>>  100000
>> (1 row)
>> 
>> Time: 781.820 ms
>> 
>> 
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>> 

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance