Hey everyone,
There's something I've been wondering about - to my understanding,
the planner won't use statistics collected on partial indices, as they may or may not reflect the correct distribution of data.
When using expressional indices which are also partial, a-la an index on a nested path of a jsonb column, or a function, with a where clause -
the planner won't actually use the collected statistics, and as such, the index may end up unused or misused.
Since postgres 14, it's now possible to create extended statistics on an _expression_, thus one could create a statistic on the _expression_ used in the partial index;
In previous versions, I believe the best one can do is either create a full index on the _expression_, so the statistics would be utilized, or to extract the _expression_ to a dedicated column.
Concretely, this came from an index we've had on a relatively large table (2TB~) on jsonb->some->>path;
the table serves several types of data, where some->>path is only relevant for data type A, which is only a fraction of the total rows - thus an index on type = A made lots of sense.
However, the collected statistics weren't used, even if the query contains type = A, and thus the usage of the index is a bit random,
it's either utilized when it shouldn't, I believe due to the nature of the hard-code estimation multipliers for the operators we use, or it's unused when it would've been appropriate.
Two questions that came to mind were:
1. Are there any other actions one could take in pg < 14 (we're in 12, specifically), avoiding creating a full index and extracting said fields to a dedicated column ?
2. Would it be theoretically possible to use the collected statistics if the index where clause is also specified in the query itself ?
or in other words, if the index only contains records where x is not null, and the query also filters on x is not null, would the partial distribution not be safe to use ?
Thank you for your time !
Danny