Re: Postgres using the wrong index index

Justin Pryzby <pryzby@xxxxxxxxxxxxx> · Mon, 23 Aug 2021 20:38:19 -0500

On Mon, Aug 23, 2021 at 08:53:15PM -0400, Matt Dupree wrote:
> Is it possible that the row estimate is off because of a column other than
> time?

I would test this by writing the simplest query that reproduces the
mis-estimate.

> I looked at the # of events in that time period and 1.8 million is
> actually a good estimate. What about the
> ((strpos(other_events_1004175222.hierarchy, '#close_onborading;'::text) <>
> 0) condition in the filter? It makes sense that Postgres wouldn't have a
> way to estimate how selective this condition is.

The issue I see is here.  I don't know where else I'd start but to understand
this.

| Index Scan using other_events_1004175222_pim_evdef_67951aef14bc_idx on public.other_events_1004175222 (cost=0.28..1,648,877.92 ROWS=1,858,891 width=32) (actual time=1.008..15.245 ROWS=23 loops=1)
|    Output: other_events_1004175222.user_id, other_events_1004175222."time", other_events_1004175222.session_id
|    Index Cond: ((other_events_1004175222."time" >= '1624777200000'::bigint) AND (other_events_1004175222."time" <= '1627369200000'::bigint))
|    Buffers: shared read=25

This has no "filter" condition, it's a "scan" node with bad over-estimate.
Note that this is due to the table's column stats, not any index's stats, so
every plan is affected. even though some happen to work well.  The consequences
of over-estimates are not as terrible as for under-estimates, but it's bad to
start with inputs that are off by 10^5.

-- 
Justin