Le mer. 12 oct. 2022 à 08:56, Guillaume Lelarge <guillaume@xxxxxxxxxxxx> a écrit :
Le mar. 11 oct. 2022 à 19:42, Guillaume Lelarge <guillaume@xxxxxxxxxxxx> a écrit :Le mar. 11 oct. 2022 à 18:27, Alvaro Herrera <alvherre@xxxxxxxxxxxxxx> a écrit :On 2022-Oct-11, Tom Lane wrote:
> Are there any tables in this query where extremal values of the join
> key are likely to be in recently-added or recently-dead rows? Does
> VACUUM'ing on the primary help?
I remember having an hypothesis, upon getting a report of this exact
problem on a customer system once, that it could be due to killtuple not
propagating to standbys except by FPIs. I do not remember if we proved
that true or not. I do not remember observing that tables were being
read, however.
Thanks for your answers.The last predicate is "and c3>='2020-10-10' and c3<='2022-10-10'. I have no idea on the actual use of c3 but rows with c3 at '2022-10-10' (which is yesterday) is much probably recently-added. I can ask my customer if you want but this looks like a pretty safe bet.On the VACUUM question, I didn't say, but we're kind of wondering if it was lacking a recent-enough VACUUM. So they're doing a VACUUM tonight on the database (and especially on the 1.6TB table which is part of the query). I'm kind of skeptical because if the VACUUM wasn't enough on the standby, it should be the same on the primary.It appears that I was wrong. I just got an email from my customer saying they got their performance back after a VACUUM on the two main tables of the query. I'll have them on the phone in about an hour. I'll probably know more then. Still wondering why it was an issue on the standby and not on the primary. VACUUM cleans up tables and indexes, and this activity goes through WAL, doesn't it?
Just finished my phone call. So, they definitely have their performance back. All they did was a VACUUM on two tables.
If I understand correctly, during "normal" operations, some information is stored on the primary and sent to standbys. For some reason, only the primary take them into account, standbys ignore them. That would explain why, when we promoted a standby without doing anything else, it had much better performance. VACUUM fixes the issue on a standby, probably by storing this information in a different way. After VACUUM, standbys stop ignoring this information, which helps get the performance back.
That sounds like a plausible explanation. I still have questions if you don't mind:
* what is this information?
* where is it stored? my guess would be indexes
* why is it ignored on standbys and used on primary?
We didn't talk much about releases, so I guess that the "standby-ignores-some-information" part is currently on all available releases?
Thank you.
--
Guillaume.