We had a mysterious (to us) slowdown today that I'm hoping someone can explain just based on PG's principles of operation. It got better by itself so it seems like it was "normal" behavior -- I just don't know what behavior it was exhibiting.
We have a table of user notifications containing about 80 million rows. It gets a lot of inserts continually, and is cleaned up once a day. There are no updates. In all history there have been about 330 million rows created.
Today we deleted about 15 million rows in one transaction from this table. Immediately afterwards, a particular SELECT started running very slowly -- 500 to 3000 ms rather than the usual <1ms.
We did an EXPLAIN ANALYZE on this select and it was still doing an index scan as usual. The *planning time* for the query is what had gotten slow. The query itself was still executing in <1ms.
Over the next few hours the time slowly improved, until it returned to the former performance. You can see a graph at https://imgur.com/a/zIfqkF5.
Is this sort of thing expected after a large delete, and if so, can someone explain the mechanism behind it? I've looked for an explanation of what could cause this kind of excess planning time and haven't found one. I'm hoping someone will just recognize what's going on here.
Here is the pg_class data for the table and index:
relname=notifications
relpages=2799880
reltuples=7.15229e+07
relallvisible=1219791
relkind=r
relnatts=11
relhassubclass=f
reloptions=
pg_table_size=22943326208
relname=index_notifications_on_person_id_and_created_at
relpages=473208
reltuples=7.03404e+07
relallvisible=0
relkind=i
relnatts=2
relhassubclass=f
reloptions=
pg_table_size=3877494784
Thanks,
Walter