Re: Subquery in a JOIN not getting restricted?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Kevin Grittner wrote:
Jay Levitt<jay.levitt@xxxxxxxxx>  wrote:

I don't get why the GROUP BY in this subquery forces it to scan
the entire users table (seq scan here, index scan on a larger
table) when there's only one row in users that can match:

Are you sure there's a plan significantly faster than 1.3 ms?

Yep!  Watch this:

drop schema if exists jaytest cascade;
create schema jaytest;
set search_path to jaytest;

create table questions (
  id int not null primary key,
  user_id int not null
);
insert into questions
  select generate_series(1,1100), (random()*2000000)::int;

create table users (
  id int not null primary key
);
insert into users select generate_series(1, 2000000);

vacuum freeze analyze;

explain analyze
select questions.id
from questions
join (
  select u.id
  from users as u
  group by u.id
) as s
on s.id = questions.user_id
where questions.id = 1;

-----------------------
Merge Join (cost=8.28..90833.02 rows=1818 width=4) (actual time=888.787..888.790 rows=1 loops=1)
   Merge Cond: (u.id = questions.user_id)
-> Group (cost=0.00..65797.47 rows=2000000 width=4) (actual time=0.017..735.509 rows=1747305 loops=1) -> Index Scan using users_pkey on users u (cost=0.00..60797.47 rows=2000000 width=4) (actual time=0.015..331.990 rows=1747305 loops=1) -> Materialize (cost=8.28..8.29 rows=1 width=8) (actual time=0.013..0.015 rows=1 loops=1) -> Sort (cost=8.28..8.28 rows=1 width=8) (actual time=0.012..0.013 rows=1 loops=1)
               Sort Key: questions.user_id
               Sort Method:  quicksort  Memory: 25kB
-> Index Scan using questions_pkey on questions (cost=0.00..8.27 rows=1 width=8) (actual time=0.006..0.006 rows=1 loops=1)
                     Index Cond: (id = 1)
 Total runtime: 888.832 ms
(11 rows)

explain analyze
select questions.id
from questions
join (
  select u.id
  from users as u
) as s
on s.id = questions.user_id
where questions.id = 1;

-----------------------
Nested Loop (cost=0.00..16.77 rows=1 width=4) (actual time=0.019..0.021 rows=1 loops=1) -> Index Scan using questions_pkey on questions (cost=0.00..8.27 rows=1 width=8) (actual time=0.008..0.009 rows=1 loops=1)
         Index Cond: (id = 1)
-> Index Scan using users_pkey on users u (cost=0.00..8.49 rows=1 width=4) (actual time=0.007..0.007 rows=1 loops=1)
         Index Cond: (u.id = questions.user_id)
 Total runtime: 0.045 ms
(6 rows)

That said, there might be some room for an optimization which pushes
that test into the query with the "group by" clause.  I don't know
if there's a problem with that which I'm missing, the construct was
judged to be too rare to be worth the cost of testing for it, or
it's just that nobody has yet gotten to it.

Anyone have more insights on whether this is hard to optimize or simply not-yet-optimized? And if the latter, where might I start looking? (Not that you -really- want me to submit a patch; my C has regressed to the "try an ampersand. OK, try an asterisk." level...)

Jay

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux