autoanalyze creates bad plan, manual analyze fixes it?

Robert Treat <rob@xxxxxxxxxx> · Mon, 14 Sep 2020 19:11:12 -0400

Howdy folks,

Recently i've run into a problem where autoanalyze is causing a query
plan to flip over to using an index which is about 10x slower, and the
problem is fixed by running an alayze manually. some relevant info:

UPDATE sleeping_intents SET
raptor_after='2020-09-14T19:21:03.581106'::timestamp,
status='requires_capture',
updated_at='2020-09-14T16:21:03.581104+00:00'::timestamptz WHERE
sleeping_intents.id = 'r2d2dcc0-8a44-4d19-c3p0-28522233b836'::uuid AND
sleeping_intents.status = 'finna' RETURNING *;

The plan generated after autoanalyze is:

Update on sleeping_intents  (cost=0.70..7.11 rows=1 width=272) (actual
time=57.945..57.945 rows=0 loops=1)
   Buffers: shared hit=43942
   ->  Index Scan using
sleeping_intents_status_created_at_raptor_after_idx on
sleeping_intents  (cost=0.70..7.11 rows=1 width=272) (actual
time=57.943..57.943 rows=0 loops=1)
         Index Cond: (status = 'init'::text)
         Filter: (id = 'r2d2dcc0-8a44-4d19-c3p0-28522233b836'::uuid)
         Rows Removed by Filter: 1262
         Buffers: shared hit=43942
 Planning time: 0.145 ms
 Execution time: 57.981 ms

after i run analyze manually, the query plan is changed to this:

Update on sleeping_intents  (cost=0.57..8.59 rows=1 width=272) (actual
time=0.023..0.023 rows=0 loops=1)
   Buffers: shared hit=7
   ->  Index Scan using sleeping_intents_pkey on sleeping_intents
(cost=0.57..8.59 rows=1 width=272) (actual time=0.022..0.022 rows=0
loops=1)
         Index Cond: (id = 'r2d2dcc0-8a44-4d19-c3p0-28522233b836'::uuid)
         Filter: (status = 'init'::text)
         Rows Removed by Filter: 1
         Buffers: shared hit=7
 Planning time: 0.092 ms
 Execution time: 0.066 ms

Note that in the second query, it switches back to using the primary
key index, which does seem like the logically better choice, even
though it shows a higher estimated cost than the "bad" case
(understanding the cost must change somewhere in the process, but
there no way to see it afaict).

In trying to determine why it switches, I dug up some likely useful info:
Index definitions:
 (20 GB) "sleeping_intents_pkey" PRIMARY KEY, btree (id)
 (37 GB) "sleeping_intents_status_created_at_raptor_after_idx" btree
(status, created_at DESC, raptor_after DESC)

Basic info on the table:
> select relid::regclass, n_live_tup,n_mod_since_analyze,analyze_count,autoanalyze_count from pg_stat_user_tables where relname='sleeping_intents';
      relid      | n_live_tup | n_mod_since_analyze | analyze_count |
autoanalyze_count
-----------------+------------+---------------------+---------------+-------------------
 sleeping_intents |  491171179 |             1939347 |             4 |
               80

(that num mods is in the last ~5 hours, the table is fairly active,
although on a relatively small portion of the data)

Statistics after manual analyze:
       tablename    |    attname    | null_frac | avg_width |
n_distinct | correlation |                   most_common_freqs
-----------------+---------------+-----------+-----------+------------+-------------+--------------------------------------------------------
 sleeping_intents | id            |         0 |        16 |         -1
| -0.00133045 | [null]
 sleeping_intents | status        |         0 |         9 |          6
|    0.848468 | {0.918343,0.0543667,0.0267567,0.000513333,1e-05,1e-05}
 sleeping_intents | created_at    |         0 |         8 |         -1
|    0.993599 | [null]
 sleeping_intents | raptor_after | 0.0663433 |         8 |  -0.933657
|     0.99392 | [null]

In a previous go around with this table, I also increased the
statistics target for the id column to 1000, vs 100 which is the
database default.

Originally I was mostly interested in trying to understand why it
would choose something other than the non-pk index, which sort of
feels like a bug; what could be faster than seeking an individual
entry in a pk index? There are cases where it might make sense, but
this doesn't seem like one (even accounting for the infrequency of the
status we are looking for, which is 1e-05, the disparity in index size
should push it back to the pk imho, unless I am not thinking through
correlation enough?).

However, it also seems very odd that this problem occurs at all. In
the last couple of times this has happened, the manual analyze has
been run within ~30-45 minutes of the auto-analyze, and while the data
is changing, it isn't changing that rapidly that this should make a
significant difference, but I don't see any other reason that
autoanalyze would produce a different result than manual analyze.

All that said, any insight on the above two items would be great, but
the most immediate concern would be around suggestions for preventing
this from happening again?

Thanks in advance,

Robert Treat
https://xzilla.net