On Thu, Aug 13, 2015 at 10:09 AM, Josh Berkus <josh@xxxxxxxxxxxx> wrote:
Setup:
* PostgreSQL 9.3.9
* 1 master, 1 replica
* Tiny database, under 0.5GB, completely cached in shared_buffers
* 90% read query traffic, which is handled by replica
* Traffic in the 1000's QPS.
The wierdness:
Periodically the master runs an "update all rows" query on the main
table in the database. When this update hits the replica via
replication stream, *some* (about 5%) of the queries which do seq scans
will stall for 22 to 32 seconds (these queries normally take about
75ms). Queries which do index scans seem not to be affected.
Thing is, the update all rows only takes 2.5 seconds to execute on the
master. So even if the update is blocking the seq scans on the replica
(and I can't see why it would), it should only block them for < 3 seconds.
Anyone seen anything like this?
Sounds like another manifestation of this: " Planner performance extremely affected by an hanging transaction (20-30 times)?"
Each backend that does a seqscan, for each tuple it scans which is not yet resolved (which near the end of the bulk update is going to be nearly equal to 2*reltuples, as every tuple has both an old and a new version so one xmax from one and one xmin from the other must be checked), it has to lock and scan the proc array lock to see if the tuple-inserting transaction has committed yet. This creates profound contention on the lock. Every scanning backend is looping over every other backend for every tuple
Once the commit of the whole-table update has replayed, the problem should go way instantly because at that point each backend doing the seqscan will find the the transaction has committed and so will set the hint bit that means all of the other seqscan backends that come after it can skip the proc array scan for that tuple.
So perhaps the commit of the whole-table update is delayed because the startup process as also getting bogged down on the same contended lock? I don't know how hard WAL replay hits the proc array lock.
Cheers,
Jeff