Re: Apparent missed query optimization with self-join and inner grouping

Tom Lane <tgl@xxxxxxxxxxxxx> · Sat, 01 Aug 2020 14:31:09 -0400

Zack Weinberg <zackw@xxxxxxxxx> writes:
> For each (experiment_id, url_id) pair for some small subset of the
> experiment_ids, I need to query the full_url_id corresponding to the
> *largest* value of redirect_num.  The query planner does something
> reasonable with this SELECT:

> => explain (analyze, verbose)
>    select b.experiment_id, b.url_id, b.full_url_id
>      from blockpage b,
>           (select experiment_id, url_id, max(redirect_num) as redirect_num
>              from blockpage group by experiment_id, url_id) bm
>     where b.experiment_id = bm.experiment_id
>       and b.url_id = bm.url_id
>       and b.redirect_num = bm.redirect_num
>       and bm.experiment_id in (16, 43);

With that query, the "bm.experiment_id in (16, 43)" restriction is
pushed into the "bm" sub-select, greatly reducing the amount of work
the GROUP BY step has to do.

> But if I change the final part of the WHERE to reference
> b.experiment_id instead of bm.experiment_id, I get this much more
> expensive query plan:

> => explain (analyze, verbose)
>    select b.experiment_id, b.url_id, b.full_url_id
>      from blockpage b,
>           (select experiment_id, url_id, max(redirect_num) as redirect_num
>              from blockpage group by experiment_id, url_id) bm
>     where b.experiment_id = bm.experiment_id
>       and b.url_id = bm.url_id
>       and b.redirect_num = bm.redirect_num
>       and b.experiment_id in (16, 43);

With that query, the GROUP BY is evaluated in full, and it costs you.

In principle, given the nearby "where b.experiment_id = bm.experiment_id"
clause, we could derive "bm.experiment_id in (16, 43)" from the stated
clause.  But we don't.  The existing machinery around derivation of
implied equalities only works for simple equalities, not OR clauses.
Extending that would be a bit of a research project, and it's far from
clear that the benefits would be worth the additional planning costs.

> What is the best way to report this to the developers?  Should I file
> a bug report?  I'm using Postgres 12.2.

This is not a bug, and you should not hold your breath waiting
for it to change.

			regards, tom lane