Steve Kehlet wrote: > I have a database that was upgraded from 9.4.1 to 9.4.2 (no pg_upgrade, we > just dropped new binaries in place) but it wouldn't start up. I found this > in the logs: > > waiting for server to start....2015-05-27 13:13:00 PDT [27341]: [1-1] LOG: > database system was shut down at 2015-05-27 13:12:55 PDT > 2015-05-27 13:13:00 PDT [27342]: [1-1] FATAL: the database system is > starting up > .2015-05-27 13:13:00 PDT [27341]: [2-1] FATAL: could not access status of > transaction 1 I am debugging today a problem currently that looks very similar to this. AFAICT the problem is that WAL replay of an online checkpoint in which multixact files are removed fails because replay tries to read a file that has already been removed. (I was nervous about removing the check to omit reading pg_multixact files while on recovery. Looks like my hunch was right, though the actual problem is not what I was fearing.) I think the fix to this is to verify whether the file exists on disk before reading it; if it doesn't, assume the truncation has already happened and that it's not necessary to remove it. > I found [this report from a couple days ago]( > https://bugs.archlinux.org/task/45071) from someone else that looks like > the same problem. Right :-( I think a patch like this should be able to fix it ... not tested yet. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c index 9568ff1..bb8cbd7 100644 --- a/src/backend/access/transam/multixact.c +++ b/src/backend/access/transam/multixact.c @@ -2208,6 +2208,12 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid) * to one. It will instead point to the multixact ID that will be * assigned the next time one is needed. * + * Note that when this is called during xlog replay, the required files + * might have already been removed, and it would be an error to try to read + * them. To work around this, we test the file for existance before trying + * to read it; if the file doesn't exist, we just don't read it. We trust + * that a further call to this routine later will set things straight. + * * NB: oldest_dataminmxid is the oldest multixact that might still be * referenced from a table, unlike in DetermineSafeOldestOffset, where we * do this same computation based on the oldest value that might still @@ -2217,16 +2223,24 @@ SetMultiXactIdLimit(MultiXactId oldest_datminmxid, Oid oldest_datoid) * new multixacts, which requires the old ones to have first been * truncated away by a checkpoint. */ - LWLockAcquire(MultiXactGenLock, LW_SHARED); - if (MultiXactState->nextMXact == oldest_datminmxid) - { - oldestOffset = MultiXactState->nextOffset; - LWLockRelease(MultiXactGenLock); - } - else { + MultiXactId nextMulti; + MultiXactOffset nextOffset; + int pageno; + + /* grab data that requires lock first */ + LWLockAcquire(MultiXactGenLock, LW_SHARED); + nextMulti = MultiXactState->nextMXact; + nextOffset = MultiXactState->nextOffset; LWLockRelease(MultiXactGenLock); - oldestOffset = find_multixact_start(oldest_datminmxid); + + pageno = MultiXactIdToOffsetPage(oldest_datminmxid); + + if ((nextMulti != oldest_datminmxid) && + (!InRecovery || SimpleLruDoesPhysicalPageExist(pageno))) + oldestOffset = find_multixact_start(oldest_datminmxid); + else + oldestOffset = nextOffset; } /* Grab lock for just long enough to set the new limit values */
-- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general