Re: xmin value from pg_stat_replication

SBob <sbob@xxxxxxxxxxxxxxxxxxxxx> · Wed, 17 Jul 2019 19:03:16 -0400

Thanks for the feedva k,

It turns out we were not looking at master & skave. The slave has no replication slot. We are seeing lots of row versions not getting cleaned up, trying toinvestigate all the possible causes. Is there a way to determine which connections (on the master or the slave) are the cause of the row versions not getting cleaned up?

Thanks

On Wed, Jul 17, 2019, at 4:32 PM, Andres Freund wrote:
Hi,

On 2019-07-17 12:31:29 -0600, S. Bob wrote:
> We have a master and a standby (Streaming replication).
> 
> This query produces a value of approx 1.5million on the master and approx
> 80million on the standby:
> 
> 
> select slot_name, slot_type, database, xmin from pg_replication_slots order
> by age(xmin);

I'd suggest using age(xmin) in the output as well, not just in the ORDER
BY, otherwise it's hard to make sense of the value.

When you say a value of '1.5 million' on the master, how did you
determine that? Because the above doesn't return anything for the
master. The simplest way to determine that is probably:

SELECT max(age(backend_xmin)) FROM pg_stat_activity;

When you say '80 million', do you mean that pg_replication_slots.xmin
was 80 million, or age(pg_replication_slots.xmin)?  If it's the former,
then that would indicate that the standby is *not* a problem, but that
you have a long-running transaction or such on the primary.

Basically, age(xmin) has to be large, not xmin itself.

> We are running replication in async mode.
> 
> 
> A couple of questions:
> 
> 
> - Is it unusual for the master & standby to differ dramatically like this?

Do you have very longrunning queries on the standby or primary? That's
typically the reason why xmin is old.

> - could the replication slot (i.e. the xmin on the standby) be preventing
> vacuum from cleaning up row versions on the master?

Yes.

Greetings,

Andres Freund