Tomas, thank you! The query: select 1 from data as d, data_class as dc where dc.data_id = d.id and d.id > 205284974 and dc.data_id > 205284974 -- new condition order by d.id limit 1000; totally solved it - it is now fast under all conditions! I thought that the optimizer would be able to infer it itself. Thank you On Thu, Feb 3, 2022 at 4:24 PM Tomas Vondra <tomas.vondra@xxxxxxxxxxxxxxxx> wrote: > > > > On 2/3/22 20:32, A Shaposhnikov wrote: > > I made a mistake yesterday claiming that the created statistics > > changed the row counts in the estimates - it did not - I looked at > > the wrong query yesterday. In the correct query plan the row estimate > > still differs from the actual by many orders of magnitude: > > > > Nested Loop (cost=1.01..27584834.53 rows=59608439 width=164) (actual > > time=0.047..2.723 rows=854 loops=1) > > > > It completely ignores the "limit" in SQL statement when doing the row > > estimates - the estimates do not change when I change it. The business > > logic really needs only 1000 rows in the result, I never need the > > whole table in the result. The query always runs fast if the limit is > > <= 46, when the limit >=47, it sometimes chooses to do a merge join on > > 2 tables with hundreds of millions of rows instead of using the > > indexes. The runtime difference is 4000x. > > > > Yes, it ignores LIMIT while calculating the estimates - the estimates > below LIMIT are as if the query executed to completion. But the "actual" > value are affected by LIMIT, because the execution terminates early. So > you can't compare these two values and conclude the estimates are off. > > Let me illustrate this on an example: > > create table t (a int, b int); > insert into t select i, i from generate_series(1,1000000) s(i); > create index on t (a,b); > vacuum analyze t; > > explain analyze select b from t order by a limit 10; > QUERY PLAN > ----------------------------------------------------------------------- > Limit (cost=0.42..0.79 rows=10 width=8) > (actual time=0.018..0.033 rows=10 loops=1) > -> Index Only Scan using t_a_b_idx on t > (cost=0.42..36214.93 rows=1000000 width=8) > (actual time=0.016..0.021 rows=10 loops=1) > Heap Fetches: 10 > Planning Time: 0.047 ms > Execution Time: 0.049 ms > > The database has *perfect* stats in this case and the estimates are spot > on too. But comparing rows=1000000 and rows=10 would lead to conclusion > to a bogus conclusion that the estimates are wrong. > > Which is why I suggested getting an explain for the query without the > limit clause. > > > I migrated the data to the latest postgres 14.1. Both versions run > > either the slow plan or the fast plan seemingly at random. The only > > reliable way to make it choose the fast plan is to decrease the > > effective_cache_size to a value 20x lower than the memory available in > > the system. Dropping and creating the statistics on the join table > > makes no difference in the estimated row counts and query plans. > > PostgreSql seems to be caching something internally and choosing the > > query plans at random - sometimes it is fast, sometimes 4000x slower > > without any changes in configuration or statistics. The runtime > > difference is 4000x. Such randomness is clearly unacceptable. I think > > I would have to try to use the "hint" extension suggested by Imre to > > make it consistently choose the fast plan. > > > > Does the plan change depending on parameters in the query. Or does it > change if you run the same query (including the same parameters in all > the conditions)? > > One possible explanation is that there's some sort of correlation > between parameters. Limit assumes the matching rows are distributed > uniformly in the input, and if that's not true (due to some sort of > correlation), it may pick the wrong plan. > > In this case there are joins so the correlation may be more complicated > - e.g. through a join condition, between (dc.class_id, dc.data_id) or > something like that. > > This part of the plan hints this might be the case, because it forces > scanning a huge part of data_class table just to get to the interesting > data_id values. > > -> Index Only Scan using data_class_pkey on data_class ta > (cost=0.57..4935483.78 rows=216964862 width=8) > (actual time=0.018..35022.908 rows=151321889 loops=1) > Heap Fetches: 151321889 > > That's hard to fix, because there's no way to tell the database about > such dependencies. I'd try two things: > > (1) Try putting the same condition on ID on data_class too. I mean, if > you do > > select 1 from data as d, data_class as dc > where dc.data_id = d.id and d.id > 205284974 > order by d.id > limit 1000; > > then clearly dc.data_id > 205284974, but the database does not realize > it and will scan the index from start for mergejoin. Nestedloop does > direct lookup for individual ID values, so it doesn't have this issue. > > So try this: > > select 1 from data as d, data_class as dc > where dc.data_id = d.id and d.id > 205284974 > and dc.data_id > 205284974 -- new condition > order by d.id > limit 1000; > > This is a special case of correlation between the tables, possibly > confusing the LIMIT estimation. We are smarter about similar cases (with > conditions on join keys), but only for equalities. > > The other thing I'd try is vacuuming the data_class. The high number of > heap fetches suggests the visibility map is in pretty poor shape, which > is not great for index-only scans. > > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company