Thanks Tom. Collecting full stats on the tables involved corrected the execution.
On Tue, Aug 13, 2024 at 9:57 AM Tom Lane <tgl@xxxxxxxxxxxxx> wrote:
Siraj G <tosiraj.g@xxxxxxxxx> writes:
> We migrated a PgSQL database from Cloud SQL to compute engine and since
> then there is a SQL we observed taking a long time. After some study, I
> found that the SQL is using NESTED LOOP where the cost is too high.
The core of your problem seems to be here:
> -> Index Scan using marketing_a_cancel__55ffff_idx on
> marketing_app_leadhistory w0 (cost=0.57..4274.30 rows=1 width=8) (actual
> time=46.678..51.232 rows=44 loops=1)
> Index Cond: ((cancel_event_id IS NOT NULL) AND
> (cancel_event_type = 1))
> Filter: ((status_id = 93) AND
> ((followup_date)::date >= '2024-08-01'::date) AND ((followup_date)::date <=
> '2024-08-07'::date))
> Rows Removed by Filter: 22268
> Buffers: shared hit=9170 read=19
If the planner had estimated 40-some rows out of this step, rather
than one, it would certainly not have chosen to use nestloop joins
atop this. So the big problem to focus on is making that estimate
better.
A secondary problem is that the choice of index seems poor: the
index itself is selecting 44+22268 = 22312 rows and then the filter
condition is throwing away 99.8% of those rows. Probably, using
an index on (status_id, followup_date) would have worked better.
I suspect that both of these things are tied to the non-normalization
of your "cancel" condition. The planner probably believes that
"cancel_event_id IS NOT NULL" is statistically independent of
"cancel_event_type = 1"; but I'll bet it isn't, and thus the index
condition selects many more rows than the planner guessed. You might
be able to improve that estimate by creating extended stats on both of
those columns, but really a better idea would be to take a step back
and figure out if those two columns can't be merged into one.
regards, tom lane