On 8/5/22 11:50, Danny Shemesh wrote: > Hey all ! > > I'm on a quest to help the planner (on pg14) use the best of several > partial, expressional indices we have on some large tables (few TBs in > size, billions of records). > > As we know, stats for expressions in partial indices aren't gathered by > default - so I'm tinkering with expressional extended stats to cover for > those. > > I've tackled two interesting points there: > 1. Seems like expressional stats involving the equality operator are > skipped or mismatched (fiddle > <https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/5379>) > Let's take the following naive example: > /create table t1 (x integer[]); > insert into t1 select array[1]::integer[] from generate_series(1, > 100000, 1); > create statistics s1 on (x[1] = 1) from t1; > analyze t1; > / > /explain analyze select * from t1 where x[1] = 1;/ > /> Seq Scan on t1 (cost=0.00..1986.00 rows=500 width=29) (actual > time=0.009..36.035 rows=100000 loops=1)/ > / > / > Now, of course one can just create the stat on x[1] directly in this > case, but I have a more complex use case where an equality operator is > beneficial; > should the above case be supported ? feels like I'm just missing > something fundamental. > Hmmm. The problem here is that the expression may be interpreted either as an operator clause (A=B), or as a boolean expression ((A=B)=TRUE). In principle we might check which option matches extended statistics, but that's not for free :-( So what the current code does is trying to match the more specific cases first, leaving the "bool expression" as a last resort. That seems like a reasonable trade off to me, but I'll think about this a bit more. There are probably other ways to match expressions, and we can't possibly explore all of them. For example you may create statistics on (x=1 AND y=2) and I doubt we'll match that, because we'll try matching individual clauses not some arbitrary combinations of clauses. (Maybe we shouldn't even allow creating such statistics ...) > 2. Less important, just a minor note - feel free to ignore - although > the eq. operator above seems to be skipped when matching the ext. stats, > I can work around this by using a CASE expression (fiddle > <https://www.db-fiddle.com/f/wJZNH1rNwJSo3D5aByQiWX/1>); > Building on the above example, we can: > /create statistics s2 on (case x[1] when 1 then true else false end) > from t1;/ > /explain analyze select * from t1 where (case x[1] when 1 then true else > false end/ > /> Seq Scan on t1 (cost=0.00..1986.00 rows=100000 width=25) (actual > time=0.011..33.721 rows=100000 loops=1)/ > / Yes, because this end ups not being matches as opclause, and therefore goes all the way to the boolvarsel() in clause_selectivity_ext. > / > What's a bit problematic here, though, is that if we mix other dependent > columns to the extended stat, and specifically if we create an mcv, > queries involving the CASE expression throw with `error: unknown clause > type 130`, where clause type == T_CaseExpr. > > The second point for me would be that I've found it a bit non intuitive > that creating an extended statistic can fail queries at query time; it > makes sense that the mcv wouldn't work for case expressions, but it > might've been a bit clearer to: > > a. Fail this at statistic creation time, potentially, or > b. Convert the type numeric in the above error to its text > representation, if we can extract it out at runtime somehow - > I couldn't find a mapping of clause type numerics to their names, and as > the node tags are generated at compile time, it could be build-dependent > and a bit hard to track down if one doesn't control the build flags > Yeah, this seems like a clear bug - we should not fail queries like this. It's a sign statext_is_compatible_clause() and the MCV code disagrees which clauses are compatible. Can you share an example triggering this? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company