Re: increasing effective_cache_size slows down join queries by a factor of 4000x

A Shaposhnikov <artyom@xxxxxxxxx> · Thu, 3 Feb 2022 20:21:12 -0800

Tomas,

thank you! The query:

select 1 from data as d, data_class as dc
    where dc.data_id = d.id and d.id > 205284974
      and dc.data_id > 205284974     -- new condition
   order by d.id
   limit 1000;

totally solved it - it is now fast under all conditions! I thought
that the optimizer would be able to infer it itself.

Thank you

On Thu, Feb 3, 2022 at 4:24 PM Tomas Vondra
<tomas.vondra@xxxxxxxxxxxxxxxx> wrote:
>
>
>
> On 2/3/22 20:32, A Shaposhnikov wrote:
> > I made a mistake yesterday claiming that the created statistics
> > changed the row counts in the estimates - it did not  - I looked at
> > the wrong query yesterday. In the correct query plan the row estimate
> > still differs from the actual by many orders of magnitude:
> >
> > Nested Loop  (cost=1.01..27584834.53 rows=59608439 width=164) (actual
> > time=0.047..2.723 rows=854 loops=1)
> >
> > It completely ignores the "limit" in SQL statement when doing the row
> > estimates - the estimates do not change when I change it. The business
> > logic really needs only 1000 rows in the result, I never need the
> > whole table in the result. The query always runs fast if the limit is
> > <= 46, when the limit >=47, it sometimes chooses to do a merge join on
> > 2 tables with hundreds of millions of rows  instead of using the
> > indexes. The runtime difference is 4000x.
> >
>
> Yes, it ignores LIMIT while calculating the estimates - the estimates
> below LIMIT are as if the query executed to completion. But the "actual"
> value are affected by LIMIT, because the execution terminates early. So
> you can't compare these two values and conclude the estimates are off.
>
> Let me illustrate this on an example:
>
> create table t (a int, b int);
> insert into t select i, i from generate_series(1,1000000) s(i);
> create index on t (a,b);
> vacuum analyze t;
>
> explain analyze select b from t order by a limit 10;
>                        QUERY PLAN
> -----------------------------------------------------------------------
>   Limit  (cost=0.42..0.79 rows=10 width=8)
>          (actual time=0.018..0.033 rows=10 loops=1)
>     ->  Index Only Scan using t_a_b_idx on t
>         (cost=0.42..36214.93 rows=1000000 width=8)
>         (actual time=0.016..0.021 rows=10 loops=1)
>           Heap Fetches: 10
>   Planning Time: 0.047 ms
>   Execution Time: 0.049 ms
>
> The database has *perfect* stats in this case and the estimates are spot
> on too. But comparing rows=1000000 and rows=10 would lead to conclusion
> to a bogus conclusion that the estimates are wrong.
>
> Which is why I suggested getting an explain for the query without the
> limit clause.
>
> > I migrated the data to the latest postgres 14.1. Both versions run
> > either the slow plan or the fast plan seemingly at random. The only
> > reliable way to make it choose the fast plan is to decrease the
> > effective_cache_size to a value 20x lower than the memory available in
> > the system. Dropping and creating the statistics on the join table
> > makes no difference in the estimated row counts and query plans.
> > PostgreSql seems to be caching something internally and choosing the
> > query plans at random - sometimes it is fast, sometimes 4000x slower
> > without any changes in configuration or statistics. The runtime
> > difference is 4000x.  Such randomness is clearly unacceptable. I think
> > I would have to try to use the "hint" extension suggested by Imre to
> > make it consistently choose the fast plan.
> >
>
> Does the plan change depending on parameters in the query. Or does it
> change if you run the same query (including the same parameters in all
> the conditions)?
>
> One possible explanation is that there's some sort of correlation
> between parameters. Limit assumes the matching rows are distributed
> uniformly in the input, and if that's not true (due to some sort of
> correlation), it may pick the wrong plan.
>
> In this case there are joins so the correlation may be more complicated
> - e.g. through a join condition, between (dc.class_id, dc.data_id) or
> something like that.
>
> This part of the plan hints this might be the case, because it forces
> scanning a huge part of data_class table just to get to the interesting
> data_id values.
>
>    -> Index Only Scan using data_class_pkey on data_class ta
>       (cost=0.57..4935483.78 rows=216964862 width=8)
>       (actual time=0.018..35022.908 rows=151321889 loops=1)
>       Heap Fetches: 151321889
>
> That's hard to fix, because there's no way to tell the database about
> such dependencies. I'd try two things:
>
> (1) Try putting the same condition on ID on data_class too. I mean, if
> you do
>
>    select 1 from data as d, data_class as dc
>     where dc.data_id = d.id and d.id > 205284974
>    order by d.id
>    limit 1000;
>
> then clearly dc.data_id > 205284974, but the database does not realize
> it and will scan the index from start for mergejoin. Nestedloop does
> direct lookup for individual ID values, so it doesn't have this issue.
>
> So try this:
>
>    select 1 from data as d, data_class as dc
>     where dc.data_id = d.id and d.id > 205284974
>       and dc.data_id > 205284974     -- new condition
>    order by d.id
>    limit 1000;
>
> This is a special case of correlation between the tables, possibly
> confusing the LIMIT estimation. We are smarter about similar cases (with
> conditions on join keys), but only for equalities.
>
> The other thing I'd try is vacuuming the data_class. The high number of
> heap fetches suggests the visibility map is in pretty poor shape, which
> is not great for index-only scans.
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company