Re: Subquery in a JOIN not getting restricted?

Jay Levitt <jay.levitt@xxxxxxxxx> · Mon, 07 Nov 2011 20:56:42 -0500

Jay Levitt wrote:
And yep! When I do a CREATE TABLE AS from that view, and add an index on
user_id, it works just as I'd like.

Or not.  Feel free to kick me back over to pgsql-novice, but I don't get why 
the GROUP BY in this subquery forces it to scan the entire users table (seq 
scan here, index scan on a larger table) when there's only one row in users 
that can match:

create table questions (
  id int not null primary key,
  user_id int not null
);
insert into questions
  select generate_series(1,1100), (random()*2000)::int;

create table users (
  id int not null primary key
);
insert into users select generate_series(1, 2000);

vacuum freeze analyze;

explain analyze
select questions.id
from questions
join (
  select u.id
  from users as u
  group by u.id
) as s
on s.id = questions.user_id
where questions.id = 1;

Hash Join  (cost=42.28..89.80 rows=2 width=4) (actual time=0.857..1.208 
rows=1 loops=1)
   Hash Cond: (u.id = questions.user_id)
   ->  HashAggregate  (cost=34.00..54.00 rows=2000 width=4) (actual 
time=0.763..1.005 rows=2000 loops=1)
         ->  Seq Scan on users u  (cost=0.00..29.00 rows=2000 width=4) 
(actual time=0.003..0.160 rows=2000 loops=1)
   ->  Hash  (cost=8.27..8.27 rows=1 width=8) (actual time=0.015..0.015 
rows=1 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 1kB
         ->  Index Scan using questions_pkey on questions  (cost=0.00..8.27 
rows=1 width=8) (actual time=0.012..0.013 rows=1 loops=1)
               Index Cond: (id = 1)
 Total runtime: 1.262 ms

This is on patched 9.0.5 built earlier today.  The real query has 
aggregates, so it really does need GROUP BY.. I think..

Jay

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance