Transaction Id Space, Freezing and Wraparound

Martín Fernández <fmartin91@xxxxxxxxx> · Tue, 20 Nov 2018 11:23:46 -0800

Hello everyone, second time writing to this awesome mailing list.

I’m helping manage a postgresql 9.2.24 high volume transaction database and yesterday we were literally one hour away of having to deal with transaction id wraparound. We were really lucky about identifying the issue one hour before getting materialized, that gave us some time to reduce the transaction load so that we could run VACUUM FREEZE on our hottest table. By running VACUUM FREEZE on our hottest table we were able to move our oldest txid by almost 1 billion. 

After almost experiencing transaction id wraparound I decided to start investigating deeper around how postgresql works with transactions ids and wraparound in general. 

First thing that generated a lot of noise in my head was the following, if pg assigns contiguous numeric values for the txid, how does pg deal with fragmentation issues ? Then I later found that the txid space is actually circular and not linearly as I originally thought it was. This exposed me to the fact that the txid is actually exported as a 64bit value where the last 32bits are an epoch. My understanding is that the epoch is the component that allows the circularity of the data structure. I then started analyzing how pg decides if a given tuple is eligible for freezing.  I found out that pg will compare a cutoff_tx (I assume this is the last committed tx) with the xmin value of the given tuple, if xmin precedes the cutoff_tx the tuple is eligible (I’m ignoring HEAP_XMAX_IS_MULTI and HEAP_MOVED cases). Now, the xmin of a tuple is an 32 bit integer, so, how is the epoch part of an exported txid considered here ? What if we had a database really old where a txid with integer value 10 is greater than a txid of value 1000 ? 

I’m probably missing something here so bare with me if my previous explanation doesn’t make sense at all.

As usual, thanks before hand, any insight will be appreciated. 

PD: I based my research looking at 9.2.24 code base.

Best,
Martín