On Wed, May 27, 2015 at 6:21 PM, Alvaro Herrera <alvherre@xxxxxxxxxxxxxxx> wrote: > Steve Kehlet wrote: >> I have a database that was upgraded from 9.4.1 to 9.4.2 (no pg_upgrade, we >> just dropped new binaries in place) but it wouldn't start up. I found this >> in the logs: >> >> waiting for server to start....2015-05-27 13:13:00 PDT [27341]: [1-1] LOG: >> database system was shut down at 2015-05-27 13:12:55 PDT >> 2015-05-27 13:13:00 PDT [27342]: [1-1] FATAL: the database system is >> starting up >> .2015-05-27 13:13:00 PDT [27341]: [2-1] FATAL: could not access status of >> transaction 1 > > I am debugging today a problem currently that looks very similar to > this. AFAICT the problem is that WAL replay of an online checkpoint in > which multixact files are removed fails because replay tries to read a > file that has already been removed. Hmm, so what exactly is the sequence of events here? It's possible that I'm not thinking clearly just now, but it seems to me that if we're replaying the same checkpoint we replayed previously, the offset of the oldest multixact will be the first file that we didn't remove. However, I can see that there could be a problem if we try to replay an older checkpoint after having already replayed a new one - for example, if a standby replays checkpoint A truncating the members multixact and performs a restart point, and then replays checkpoint B truncating the members multixact again but without performing a restartpoint, and then is shut down, it will resume replay from checkpoint A, and trouble will ensue. Is that the scenario, or is there something else? > I think the fix to this is to verify whether the file exists on disk > before reading it; if it doesn't, assume the truncation has already > happened and that it's not necessary to remove it. That might be an OK fix, but this implementation doesn't seem very clean. If we're going to remove the invariant that MultiXactState->oldestOffset will always be valid after replaying a checkpoint, then we should be explicit about that and add a flag indicating whether or not it's currently valid. Shoving nextOffset in there and hoping that's good enough seems like a bad idea to me. I think we should modify the API for find_multixact_start. Let's have it return a Boolean and return oldestOffset via an out parameter. If !InRecovery, it will always return true and set the out parameter; but if in recovery, it is allowed to return false without setting the out parameter. Both values can get stored in MultiXactState, and we can adjust the logic elsewhere to disregard oldestOffset when the accompanying flag is false. This still leaves open an ugly possibility: can we reach normal running without a valid oldestOffset? If so, until the next checkpoint happens, autovacuum has no clue whether it needs to worry. There's got to be a fix for that, but it escapes me at the moment. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general