On Sun, Apr 7, 2019 at 2:31 AM Pavel Suderevsky <psuderevsky@xxxxxxxxx> wrote: > Probably if you advise me what could cause "pg_serial": apparent wraparound messages I would have more chances to handle all the performance issues. 9.6 has this code: /* * Give a warning if we're about to run out of SLRU pages. * * slru.c has a maximum of 64k segments, with 32 (SLRU_PAGES_PER_SEGMENT) * pages each. We need to store a 64-bit integer for each Xid, and with * default 8k block size, 65536*32 pages is only enough to cover 2^30 * XIDs. If we're about to hit that limit and wrap around, warn the user. * * To avoid spamming the user, we only give one warning when we've used 1 * billion XIDs, and stay silent until the situation is fixed and the * number of XIDs used falls below 800 million again. * * XXX: We have no safeguard to actually *prevent* the wrap-around, * though. All you get is a warning. */ if (oldSerXidControl->warningIssued) { TransactionId lowWatermark; lowWatermark = tailXid + 800000000; if (lowWatermark < FirstNormalTransactionId) lowWatermark = FirstNormalTransactionId; if (TransactionIdPrecedes(xid, lowWatermark)) oldSerXidControl->warningIssued = false; } else { TransactionId highWatermark; highWatermark = tailXid + 1000000000; if (highWatermark < FirstNormalTransactionId) highWatermark = FirstNormalTransactionId; if (TransactionIdFollows(xid, highWatermark)) { oldSerXidControl->warningIssued = true; ereport(WARNING, (errmsg("memory for serializable conflict tracking is nearly exhausted"), errhint("There might be an idle transaction or a forgotten prepared transaction causing this."))); } } Did you see that warning at some point before the later error? I think if you saw that warning, and then later the error you reported, it's probably just being prudent and avoiding the truncation because it detects a potential wraparound. If it actually does wrap around, then there is a potential for OldSerXidGetMinConflictCommitSeqNo() to report a too-recent minimum conflict CSN for a given XID, and I'm not sure what consequence that would have (without drinking a lot more coffee), but potentially some kind of incorrect answer. On server restart the problem fixes itself because pg_serial is only used to spill state relating to transactions running in this server lifetime. I wonder if this condition required you to have a serializable transaction running (or prepared) while you consume 2^30 AKA ~1 billion xids. I think it is unreachable in v11+ because commit e5eb4fa8 allowed for more SLRU pages to avoid this artificially early wrap. Gee, it'd be nice to use FullTransactionId for SERIALIZABLEXACT and pg_serial in v13 and not to even have to think about wraparound here. It's more doable here than elsewhere because the data on disk isn't persistent across server restart, let alone pg_upgrade. Let's see... each segment file is 256kb and we need to be able to address 2^64 * sizeof(SerCommitSequenceNumber), so you'd have segment files numbered from 0 up to 1ffffffffffff (so you'd need slru.c to support 13 char segment names and 64 bit segment numbers, whereas it currently has a limit of 6 in SlruScanDirectory and uses int for segment number). You'd be addressing them by FullTransactionId, but that's just the index used to find entries -- the actual amount of data stored wouldn't change, you'd just start seeing wider filenames, and all the fragile modulo comparison truncation stuff would disappear from the tree. -- Thomas Munro https://enterprisedb.com