Re: Plans with bad estimates/paths

"Joe Wildish" <joe@xxxxxxxxxxxxxxx> · Thu, 18 Nov 2021 17:37:03 +0000

Hi Tomas,

On Tue, 16 Nov 2021, at 22:03, Tomas Vondra wrote:

> It sure smells like a case of correlated data for some clients but not
> others, but who knows ... Hard to say without seeing the nodes below the
> join. If the lower nodes are estimated accurately, then it's the join
> selectivity that is estimated poorly, and there's not much we can do
> about it :-(

Here is a link to a segment of a plan where the estimations degrade.
I've also pasted the plan segment at the end of this message.

https://gist.githubusercontent.com/joewildish/8eb66815d728399687b24647df941746/raw/ef8c62455dd34d76807feff5d164e6d8857014e2/gistfile1.txt

The nodes beneath the Merge Join seem to be estimated accurately (within 2x of the actual)? But, the Merge Join itself is a 10x under-estimate and the ones above that are even further out.  You can see in the plan that this particular execution is for multiple clients (144, 145, 146 & 155). In my experiments, I am getting better results with a single client, although I don't know if that is down to the actual data size being smaller, or if the estimation for multiple values is inherently more inaccurate. Anyway, is this an example of a join selectivity problem?

> Do the "good" plans have the same underestimate? Maybe there's just much
> less data for those clients, and the "poor" plan ends up being fast anyway?

I think that may be happening but haven't been able to capture a plan yet that confirms it.

>> I am assuming this underestimation is the source of the planner
>> choosing the "wrong" path; in production, we have had to resort to
>> setting the join and from collapse limits to 1 to force a naive plan
>> to be generated.  This is giving us execution times in the 10/20
>> second range vs. >45m in some cases.
>
> That may be happening, yes. So is it the join order that ends up being
> wrong, or the join methods?

I've seen both. For example, in the worst-performing plans, the root node has its first input estimated to produce 1 row and its second input estimated to produce c.40,000 rows. The second input is a SeqScan, presumably because of the single-row estimate of its sibling. Of course, the estimate of 1 turns out to be wildly inaccurate, the join produces c.2BN rows, and most are then filtered out.

In other (better) plans, the troublesome SeqScan doesn't exist: the relation in question gets joined lower down the tree, and it is not traversed by a SeqScan.

> Have you tried increasing the collapse limit
> instead? Although, if it works for some queries but not others, that's
> likely not going to help.

Yes but it often creates poorer plans rather than better plans. In fact, I increase those limits locally when testing, to get the planner to consistently produce what it thinks is the best plan. (I find without this, sub-paths can be materialised, presumably because one of the collapse limits has been hit. Obviously I'd rather remove this particular variability when trying to debug).

> The first thing I'd do is reduce the query size as much as possible. In
> this case I'd try removing as many joins as possible until the issue
> disappears. The simpler the query, the easier it is to investigate.
>
> And yes, replacing parts of a query with a temporary table is a common
> solution, because it's possible to collect statistics on it, build
> indexes etc. That usually solves estimation issues in multi-tenancy.
> Sometimes even a CTE with materialization is enough.

Thank you. It seems, then, that the solution lies in simplifying the queries such that the chances of poor estimation are reduced/removed. (I have had some success with this today. One of the queries was bringing in a view which resulted in needless self-joins). However, such a solution begs the question -- which bits of the query should be pre-computed? And, will such work survive further changes in the underlying data distributions?

Thanks,
-Joe

Plan segment:

Nested Loop Left Join  (cost=2.29..348095.52 rows=3 width=93) (actual time=828.619..3368.362 rows=517367 loops=1)
  ->  Nested Loop  (cost=1.86..348094.00 rows=3 width=81) (actual time=828.609..2655.136 rows=517367 loops=1)
        Join Filter: (clients.id = order_items.client_id)
        ->  Nested Loop  (cost=1.43..347875.73 rows=400 width=60) (actual time=828.603..1890.900 rows=517367 loops=1)
              Join Filter: (clients.id = order_items_1.client_id)
              ->  Merge Join  (cost=1.00..322712.24 rows=50370 width=48) (actual time=828.584..1224.993 rows=517367 loops=1)
                    Merge Cond: (order_items_2.client_id = clients.id)
                    ->  GroupAggregate  (cost=0.85..290718.67 rows=2518498 width=44) (actual time=0.040..1126.298 rows=1856351 loops=1)
                          Group Key: order_items_2.client_id, order_items_2.order_item_id
                          ->  Merge Left Join  (cost=0.85..240348.71 rows=2518498 width=18) (actual time=0.033..535.466 rows=1858076 loops=1)
                                Merge Cond: ((order_items_2.client_id = discount_allocations.client_id) AND (order_items_2.order_item_id = discount_allocations.order_item_id))
                                ->  Index Only Scan using pk_order_items on order_items order_items_2  (cost=0.43..131145.90 rows=2518498 width=12) (actual time=0.022..150.068 rows=1856352 loops=1)
                                      Heap Fetches: 0
                                ->  Index Scan using pk_discount_allocations on discount_allocations  (cost=0.42..89823.14 rows=931889 width=18) (actual time=0.008..110.223 rows=677531 loops=1)
                    ->  Index Only Scan using pk_clients on clients  (cost=0.14..8.64 rows=4 width=4) (actual time=0.003..0.011 rows=4 loops=1)
                          Index Cond: (id = ANY ('{144,145,146,155}'::integer[]))
                          Heap Fetches: 0 
              ->  Index Only Scan using pk_order_items on order_items order_items_1  (cost=0.43..0.49 rows=1 width=12) (actual time=0.001..0.001 rows=1 loops=517367)
                    Index Cond: ((client_id = order_items_2.client_id) AND (order_item_id = order_items_2.order_item_id))
                    Heap Fetches: 0 
        ->  Index Scan using pk_order_items on order_items  (cost=0.43..0.53 rows=1 width=29) (actual time=0.001..0.001 rows=1 loops=517367)
              Index Cond: ((client_id = order_items_1.client_id) AND (order_item_id = order_items_1.order_item_id))
  ->  Index Scan using pk_order_item_variants on order_item_variants  (cost=0.43..0.51 rows=1 width=20) (actual time=0.001..0.001 rows=1 loops=517367)
        Index Cond: ((client_id = order_items_1.client_id) AND (order_item_id = order_items_1.order_item_id))