Re: [HACKERS] Re: 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

Thomas Munro <thomas.munro@xxxxxxxxxxxxxxxx> · Wed, 3 Jun 2015 13:59:59 +1200

On Tue, Jun 2, 2015 at 9:30 AM, Alvaro Herrera <alvherre@xxxxxxxxxxxxxxx> wrote:
> My guess is that the file existed, and perhaps had one or more pages,
> but the wanted page doesn't exist, so we tried to read but got 0 bytes
> back.  read() returns 0 in this case but doesn't set errno.
>
> I didn't find a way to set things so that the file exists but is of
> shorter contents than oldestMulti by the time the checkpoint record is
> replayed.

I'm just starting to learn about the recovery machinery, so forgive me
if I'm missing something basic here, but I just don't get this.  As I
understand it, offsets/0046 should either have been copied with that
page present in it if it existed before the backup started (apparently
not in this case), or extended to contain it by WAL records that come
after the backup label but before the checkpoint record that
references it (also apparently not in this case).  If neither of these
things happened then that is completely different from the
segment-does-not-exist case where we read zeroes if in recovery on the
assumption that later WAL records must be about to delete the file.
There is no way that future WAL records will make an existing segment
file shorter! So at this point don't we know that there is something
wrong with the backup itself?

Put another way, if you bring this up under 9.4.1, won't it also be
unable to access multixact 4624559 at this point?  Of course it won't
try to do so during recovery like 9.4.2 does, but I'm just trying to
understand how this is supposed to work for 9.4.1 if it needs to
access that multixact for other reasons once normal running is reached
(say you recover up to that checkpoint, and then run
pg_get_multixact_members, or a row has that xmax and its members to be
looked up by a vacuum or any normal transaction).  In other words,
isn't this a base backup that is somehow broken, not at all like the
pg_upgrade corruption case which involved the specific case of
multixact 1 and an entirely missing segment file, and 9.4.2 just tells
you about it sooner than 9.4.1?

For what it's worth, I've also spent a lot of time trying to reproduce
basebackup problems with multixact creation, vacuums and checkpoints
injected at various points between copying backup label, pg_multixact,
and pg_control.  I have so far failed to produce anything more
interesting than the 'reading zeroes' case (see attached
"copy-after-trunction.sh") and a case where the control file points at
a segment that doesn't exist, but it doesn't matter because the backup
label points at a checkpoint from a time when it did and
oldestMultiXactId is updated from there, and then procedes exactly as
it should (see "copy-before-truncation.sh").  I updated my scripts to
look a bit more like your nicely automated example (though mine use a
different trick to create small quantities of multixacts so they run
against unpatched master).  I have also been considering a scenario
where multixact ID wraparound occurs during basebackup with some
ordering that causes trouble, but I don't yet see why it would break
if you replay the WAL from the backup label checkpoint (and I think
the repro would take days/weeks to run...)

-- 
Thomas Munro
http://www.enterprisedb.com
Attachment:
copy-after-truncation.sh

Description: Bourne shell script
Attachment:
copy-before-truncation.sh

Description: Bourne shell script
-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general