Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

Noah Misch <noah@xxxxxxxxxxxx> · Mon, 1 Jun 2015 00:55:34 -0400

On Fri, May 29, 2015 at 10:37:57AM +1200, Thomas Munro wrote:
> On Fri, May 29, 2015 at 7:56 AM, Robert Haas <robertmhaas@xxxxxxxxx> wrote:
> > - There's a third possible problem related to boundary cases in
> > SlruScanDirCbRemoveMembers, but I don't understand that one well
> > enough to explain it.  Maybe Thomas can jump in here and explain the
> > concern.
> 
> I noticed something in passing which is probably not harmful, and not
> relevant to this bug report, it was just a bit confusing while
> testing:  SlruScanDirCbRemoveMembers never deletes any files if
> rangeStart == rangeEnd.  In practice, if you have an idle cluster with
> a lot of multixact data and you VACUUM FREEZE all databases and then
> CHECKPOINT, you might be surprised to see no member files going away
> quite yet, but they'll eventually be truncated by a future checkpoint,
> once rangeEnd has had a chance to advance to the next page due to more
> multixacts being created.
> 
> If we want to fix this one day, maybe the right thing to do is to
> treat the rangeStart == rangeEnd case the same way we treat rangeStart
> < rangeEnd, that is, to assume that the range of pages isn't
> wrapped/inverted in this case.

I agree.  Because we round rangeStart down to a segment boundary, oldest and
next member offsets falling on the same page typically implies
rangeStart<rangeEnd.  Only when the page they share happens to be the first
page of a segment does one observe rangeStart==RangeEnd.

While testing this (with inconsistent-multixact-fix-master.patch applied,
FWIW), I noticed a nearby bug with a similar symptom.  TruncateMultiXact()
omits the nextMXact==oldestMXact special case found in each other
find_multixact_start() caller, so it reads the offset of a not-yet-created
MultiXactId.  The usual outcome is to get rangeStart==0, so we truncate less
than we could.  This can't make us truncate excessively, because
nextMXact==oldestMXact implies no table contains any mxid.  If nextMXact
happens to be the first of a segment, an error is possible.  Procedure:

1. Make a fresh cluster.
2. UPDATE pg_database SET datallowconn = true
3. Consume precisely 131071 mxids.  Number of offsets per mxid is unimportant.
4. vacuumdb --freeze --all

Expected state after those steps:
$ pg_controldata | grep NextMultiXactId
Latest checkpoint's NextMultiXactId:  131072

Checkpoint will fail like this:
26699 2015-05-31 17:22:33.134 GMT LOG:  statement: checkpoint
26661 2015-05-31 17:22:33.134 GMT DEBUG:  performing replication slot checkpoint
26661 2015-05-31 17:22:33.136 GMT ERROR:  could not access status of transaction 131072
26661 2015-05-31 17:22:33.136 GMT DETAIL:  Could not open file "pg_multixact/offsets/0002": No such file or directory.
26699 2015-05-31 17:22:33.234 GMT ERROR:  checkpoint request failed
26699 2015-05-31 17:22:33.234 GMT HINT:  Consult recent messages in the server log for details.
26699 2015-05-31 17:22:33.234 GMT STATEMENT:  checkpoint

This does not block startup, and creating one mxid hides the problem again.
Thus, it is not a top-priority bug like some other parts of this thread.  I
mention it today mostly so it doesn't surprise hackers testing other fixes.

Thanks,
nm

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general