Robert Haas wrote: > 2. If you pg_upgrade to 9.3.7 or 9.4.2, then you may have datminmxid > values which are equal to the next-mxid counter instead of the correct > value; in other words, they are too new. What you describe is what happens if you upgrade from 9.2 or earlier. For this case we use this call: exec_prog(UTILITY_LOG_FILE, NULL, true, "\"%s/pg_resetxlog\" -m %u,%u \"%s\"", new_cluster.bindir, old_cluster.controldata.chkpnt_nxtmulti + 1, old_cluster.controldata.chkpnt_nxtmulti, new_cluster.pgdata); This uses the old cluster's nextMulti value as oldestMulti in the new cluster, and that value+1 is used as nextMulti. This is correct: we don't want to preserve any of the multixact state from the previous cluster; anything before that value can be truncated with no loss of critical data. In fact, there is no critical data before that value at all. If you upgrade from 9.3, this other call is used instead: /* * we preserve all files and contents, so we must preserve both "next" * counters here and the oldest multi present on system. */ exec_prog(UTILITY_LOG_FILE, NULL, true, "\"%s/pg_resetxlog\" -O %u -m %u,%u \"%s\"", new_cluster.bindir, old_cluster.controldata.chkpnt_nxtmxoff, old_cluster.controldata.chkpnt_nxtmulti, old_cluster.controldata.chkpnt_oldstMulti, new_cluster.pgdata); In this case we use the oldestMulti from the old cluster as oldestMulti in the new cluster, which is also correct. > A. Most obviously, we should fix pg_upgrade so that it installs > chkpnt_oldstMulti instead of chkpnt_nxtmulti into datfrozenxid, so > that we stop creating new instances of this problem. That won't get > us out of the hole we've dug for ourselves, but we can at least try to > stop digging. (This is assuming I'm right that chkpnt_nxtmulti is the > wrong thing - anyone want to double-check me on that one?) I don't think there's anything that we need to fix here. > B. We need to change find_multixact_start() to fail softly. This is > important because it's legitimate for it to fail in recovery, as > discussed upthread, and also because we probably want to eliminate the > fail-to-start hazard introduced in 9.4.2 and 9.3.7. > find_multixact_start() is used in three places, and they each require > separate handling: > > - In SetMultiXactIdLimit, find_multixact_start() is used to set > MultiXactState->oldestOffset, which is used to determine how > aggressively to vacuum. If find_multixact_start() fails, we don't > know how aggressively we need to vacuum to prevent members wraparound; > it's probably best to decide to vacuum as aggressively as possible. > Of course, if we're in recovery, we won't vacuum either way; the fact > that it fails softly is good enough. Sounds good. > - In DetermineSafeOldestOffset, find_multixact_start() is used to set > MultiXactState->offsetStopLimit. If it fails here, we don't know when > to refuse multixact creation to prevent wraparound. Again, in > recovery, that's fine. If it happens in normal running, it's not > clear what to do. Refusing multixact creation is an awfully blunt > instrument. Maybe we can scan pg_multixact/offsets to determine a > workable stop limit: the first file greater than the current file that > exists, minus two segments, is a good stop point. Perhaps we ought to > use this mechanism here categorically, not just when > find_multixact_start() fails. It might be more robust than what we > have now. Blunt instruments have the desirable property of being simple. We don't want any more clockwork here, I think --- this stuff is pretty complicated already. As far as I understand, if during normal running we see that find_multixact_start has failed, sufficient vacuuming should get it straight eventually with no loss of data. > - In TruncateMultiXact, find_multixact_start() is used to set the > truncation point for the members SLRU. If it fails here, I'm guessing > the right solution is not to truncate anything - instead, rely on > intense vacuuming to eventually advance oldestMXact to a value whose > member data still exists; truncate then. Fine. > C. I think we should also change TruncateMultiXact() to truncate > offsets first, and then members. As things stand, if we truncate > members first, we increase the risk of seeing an offset that will fail > when passed to find_multixact_start(), because TruncateMultiXact() > might get interrupted before it finishes. That seem like an > unnecessary risk. Not sure about this point. We did it the way you propose previously, and found it to be a problem because sometimes we tried to read an offset file that was no longer there. Do we really read member files anywhere? I thought we only tried to read offset files. If we remove member files, what is it that we try to read and find not to be present? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general