On 11/16/21 9:22 PM, Joe Wildish wrote: > Hello, > > I am struggling with a problem that appears planning related. I'm > hoping folk here may be able to advise on how best to tackle the > issue. > > We have a system that ingests JSON messages containing order data. > The data from these messages is inserted into a normalised table > structure; "order", "order_item", "order_discount", etc. Orders can > have multiple items, items can have multiple discounts and so forth. > Referential constraints exist between the tables in the expected way. > Fields in the tables are all declared NOT NULL. If the schema of the > JSON is such that a field is optional, we reflect that optionality in > another "order_<X>" table, and make "order_<X>" be a subset of > "order". Moreover, we ingest messages for different > clients/customers. Therefore each message-related table carries with > it a client identifier which forms part of the primary key on the > table. For example, "order" has a key of "(client_id, order-id)". > > We have written transformations that calculate various facts about > orders. For example, one of the transforms emits order item data > where we have calculated the overall discount for the item, and have > "filled in" some of the optional fields with defaults, and have > categorised the order item on the basis of some aspect of the order > (e.g. "this is an e-commerce order, this is retail order"). These > transforms are typically per client (e.g. `WHERE client_id = 123`) > although in some cases we transform over multiple clients (e.g. > `WHERE client_id = ANY (SELECT client_id FROM clients WHERE ...)`). > > The issue is that for some clients, or combination of clients, the > planner is choosing a path that takes substantially longer to > evaluate than the plan it typically chooses for other clients. The > number of tables being joined is in the region of 15. There is an > extended statistic object in place to help the one aggregation that > occurs (defined on the basis of the `GROUP BY` columns) to try and > get a better estimate of the likely number of rows emitted. However, > what I am often seeing in the explain plan is that the estimated rows > is small and the actuals are significantly larger e.g. > > Merge Join (cost=1.14..253250.32 rows=1099 width=69) (actual > time=1268.587..2400.353 rows=4282355 loops=1) > It sure smells like a case of correlated data for some clients but not others, but who knows ... Hard to say without seeing the nodes below the join. If the lower nodes are estimated accurately, then it's the join selectivity that is estimated poorly, and there's not much we can do about it :-( Do the "good" plans have the same underestimate? Maybe there's just much less data for those clients, and the "poor" plan ends up being fast anyway? > I am assuming this underestimation is the source of the planner > choosing the "wrong" path; in production, we have had to resort to > setting the join and from collapse limits to 1 to force a naive plan > to be generated. This is giving us execution times in the 10/20 > second range vs. >45m in some cases. > That may be happening, yes. So is it the join order that ends up being wrong, or the join methods? Have you tried increasing the collapse limit instead? Although, if it works for some queries but not others, that's likely not going to help. > (a) Do you have any suggestions on a general approach to tackling the > problem? For example, one option might be to pre-compute some of the > subqueries that are occurring in the transforms, write the results > into their own tables, and substitute those tables in place of the > subqueries in the main transform. Is this something people typically > do in this situation? > The first thing I'd do is reduce the query size as much as possible. In this case I'd try removing as many joins as possible until the issue disappears. The simpler the query, the easier it is to investigate. And yes, replacing parts of a query with a temporary table is a common solution, because it's possible to collect statistics on it, build indexes etc. That usually solves estimation issues in multi-tenancy. Sometimes even a CTE with materialization is enough. > (b) Do I need to provide a schema and explain plans to get any > concrete advice on how to proceed? > That would be helpful, particularly after making the query as small as possible. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company