Plans with bad estimates/paths

"Joe Wildish" <joe@xxxxxxxxxxxxxxx> · Tue, 16 Nov 2021 20:22:44 +0000

Hello,

I am struggling with a problem that appears planning related. I'm hoping folk here may be able to advise on how best to tackle the issue.

We have a system that ingests JSON messages containing order data. The data from these messages is inserted into a normalised table structure; "order", "order_item", "order_discount", etc.  Orders can have multiple items, items can have multiple discounts and so forth.  Referential constraints exist between the tables in the expected way.  Fields in the tables are all declared NOT NULL.  If the schema of the JSON is such that a field is optional, we reflect that optionality in another "order_<X>" table, and make "order_<X>" be a subset of "order".  Moreover, we ingest messages for different clients/customers.  Therefore each message-related table carries with it a client identifier which forms part of the primary key on the table.  For example, "order" has a key of "(client_id, order-id)".

We have written transformations that calculate various facts about orders.  For example, one of the transforms emits order item data where we have calculated the overall discount for the item, and have "filled in" some of the optional fields with defaults, and have categorised the order item on the basis of some aspect of the order (e.g. "this is an e-commerce order, this is retail order").  These transforms are typically per client (e.g. `WHERE client_id = 123`) although in some cases we transform over multiple clients (e.g. `WHERE client_id = ANY (SELECT client_id FROM clients WHERE ...)`).

The issue is that for some clients, or combination of clients, the planner is choosing a path that takes substantially longer to evaluate than the plan it typically chooses for other clients.  The number of tables being joined is in the region of 15.  There is an extended statistic object in place to help the one aggregation that occurs (defined on the basis of the `GROUP BY` columns) to try and get a better estimate of the likely number of rows emitted.  However, what I am often seeing in the explain plan is that the estimated rows is small and the actuals are significantly larger e.g.

  Merge Join  (cost=1.14..253250.32 rows=1099 width=69) (actual time=1268.587..2400.353 rows=4282355 loops=1)

I am assuming this underestimation is the source of the planner choosing the "wrong" path; in production, we have had to resort to setting the join and from collapse limits to 1 to force a naive plan to be generated.  This is giving us execution times in the 10/20 second range vs. >45m in some cases.

(a) Do you have any suggestions on a general approach to tackling the problem? For example, one option might be to pre-compute some of the subqueries that are occurring in the transforms, write the results into their own tables, and substitute those tables in place of the subqueries in the main transform. Is this something people typically do in this situation?

(b) Do I need to provide a schema and explain plans to get any concrete advice on how to proceed?

Any advice/suggestions would be much appreciated.

Thanks,
-Joe