On Thu, May 28, 2015 at 8:51 AM, Robert Haas <robertmhaas@xxxxxxxxx> wrote: > [ speculation ] OK, I finally managed to reproduce this, after some off-list help from Steve Kehlet (the reporter), Alvaro, and Thomas Munro. Here's how to do it: 1. Install any pre-9.3 version of the server and generate enough multixacts to create at least TWO new segments. When you shut down the server, all segments except for the most current one will be removed. At this point, the only thing in $PGDATA/pg_multixact/offsets should be a single file, and the name of that file should not be 0000 or 0001. 2. Use pg_upgrade to upgrade to 9.3.4. It is possible that versions < 9.3.4 will also work here, but you must not use 9.3.5 or higher, because 9.3.5 includes Bruce's commit 3d2e18510, which arranged to preserve relminmxid and datminmxid values. At this point, pg_controldata on the new cluster should show an oldestMultiXid value greater than 1 (copied from the old cluster), but all the datminmxid values are 1. Also, initdb will have left behind a bogus 0000 file in pg_multixact/offsets. 3. Move to 9.3.5 (or 9.3.6), not via pg_upgrade, but just by dropping in the new binaries. Follow the instructions in the 9.3.5 release notes; since you created at least TWO new segments in step one, there will be no 0001 file, and the query there will say that you should remove the bogus 0000 file. So do that, leaving just the good file in pg_multixact/offsets. At this point, pg_multixact/offsets is OK, and pg_controldata still says that oldestMultiXid > 1, so that is also OK. The only problem is that we've got some bogus datminmxid values floating around. Our next step will be to convince vacuum to propagate the bogus datminmxid values back into pg_controldata. 4. Consume at least one transaction ID (e.g. SELECT txid_current()) and then do this: postgres=# set vacuum_freeze_min_age = 0; SET postgres=# set vacuum_freeze_table_age = 0; SET postgres=# vacuum; VACUUM Setting the GUCs forces full table scans, so that we advance relfrozenxid. But notice that we were careful not to just run VACUUM FREEZE, which would have also advanced relminmxid, which, for purposes of reproducing this bug, is not what we want to happen. So relminmxid is still (incorrectly) set to 1 for every database. However, since the vacuum did advance relfrozenxid, it will call vac_truncate_clog, which will call SetMultiXactIdLimit, which will propagate the bogus datminmxid = 1 setting into shared memory. (In my testing, this step doesn't work if performed on 9.3.4; you have to do it on 9.3.5. I think that's because of Tom's commit 78db307bb, but I believe in a more complex test scenario you might be able to get this to happen on 9.3.4 also.) I believe it's the case that an autovacuum of even a single table can substitute for this step if it happens to advance relfrozenxid but not relminmxid. 5. The next checkpoint, or the shutdown checkpoint in any event, will propagate the bogus value of 1 from shared memory back into the control file. 6. Now try to start 9.3.7. It will see the bogus oldestMultiXid = 1 value in the control file, attempt to read the corresponding offsets file, and die. In the process of investigating this, we found a few other things that seem like they may also be bugs: - As noted upthread, replaying an older checkpoint after a newer checkpoint has already happened may lead to similar problems. This may be possible when starting from an online base backup; or when restarting a standby that did not perform a restartpoint when replaying the last checkpoint before the shutdown. - pg_upgrade sets datminmxid = old_cluster.controldata.chkpnt_nxtmulti, which is correct only if there are ZERO multixacts in use at the time of the upgrade. It would be best, I think, to set this to the same value it had in the old cluster, but if we're going to use a blanket value, I think it needs to be chkpnt_oldstMulti. - There's a third possible problem related to boundary cases in SlruScanDirCbRemoveMembers, but I don't understand that one well enough to explain it. Maybe Thomas can jump in here and explain the concern. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general