Thanks for the feedva k,
It turns out we were not looking at master & skave. The slave has no replication slot. We are seeing lots of row versions not getting cleaned up, trying toinvestigate all the possible causes. Is there a way to determine which connections (on the master or the slave) are the cause of the row versions not getting cleaned up?
Thanks
On Wed, Jul 17, 2019, at 4:32 PM, Andres Freund wrote:
Hi,On 2019-07-17 12:31:29 -0600, S. Bob wrote:> We have a master and a standby (Streaming replication).>> This query produces a value of approx 1.5million on the master and approx> 80million on the standby:>>> select slot_name, slot_type, database, xmin from pg_replication_slots order> by age(xmin);I'd suggest using age(xmin) in the output as well, not just in the ORDERBY, otherwise it's hard to make sense of the value.When you say a value of '1.5 million' on the master, how did youdetermine that? Because the above doesn't return anything for themaster. The simplest way to determine that is probably:SELECT max(age(backend_xmin)) FROM pg_stat_activity;When you say '80 million', do you mean that pg_replication_slots.xminwas 80 million, or age(pg_replication_slots.xmin)? If it's the former,then that would indicate that the standby is *not* a problem, but thatyou have a long-running transaction or such on the primary.Basically, age(xmin) has to be large, not xmin itself.> We are running replication in async mode.>>> A couple of questions:>>> - Is it unusual for the master & standby to differ dramatically like this?Do you have very longrunning queries on the standby or primary? That'stypically the reason why xmin is old.> - could the replication slot (i.e. the xmin on the standby) be preventing> vacuum from cleaning up row versions on the master?Yes.Greetings,Andres Freund