On Tue, Jun 2, 2015 at 9:30 AM, Alvaro Herrera <alvherre@xxxxxxxxxxxxxxx> wrote: > My guess is that the file existed, and perhaps had one or more pages, > but the wanted page doesn't exist, so we tried to read but got 0 bytes > back. read() returns 0 in this case but doesn't set errno. > > I didn't find a way to set things so that the file exists but is of > shorter contents than oldestMulti by the time the checkpoint record is > replayed. I'm just starting to learn about the recovery machinery, so forgive me if I'm missing something basic here, but I just don't get this. As I understand it, offsets/0046 should either have been copied with that page present in it if it existed before the backup started (apparently not in this case), or extended to contain it by WAL records that come after the backup label but before the checkpoint record that references it (also apparently not in this case). If neither of these things happened then that is completely different from the segment-does-not-exist case where we read zeroes if in recovery on the assumption that later WAL records must be about to delete the file. There is no way that future WAL records will make an existing segment file shorter! So at this point don't we know that there is something wrong with the backup itself? Put another way, if you bring this up under 9.4.1, won't it also be unable to access multixact 4624559 at this point? Of course it won't try to do so during recovery like 9.4.2 does, but I'm just trying to understand how this is supposed to work for 9.4.1 if it needs to access that multixact for other reasons once normal running is reached (say you recover up to that checkpoint, and then run pg_get_multixact_members, or a row has that xmax and its members to be looked up by a vacuum or any normal transaction). In other words, isn't this a base backup that is somehow broken, not at all like the pg_upgrade corruption case which involved the specific case of multixact 1 and an entirely missing segment file, and 9.4.2 just tells you about it sooner than 9.4.1? For what it's worth, I've also spent a lot of time trying to reproduce basebackup problems with multixact creation, vacuums and checkpoints injected at various points between copying backup label, pg_multixact, and pg_control. I have so far failed to produce anything more interesting than the 'reading zeroes' case (see attached "copy-after-trunction.sh") and a case where the control file points at a segment that doesn't exist, but it doesn't matter because the backup label points at a checkpoint from a time when it did and oldestMultiXactId is updated from there, and then procedes exactly as it should (see "copy-before-truncation.sh"). I updated my scripts to look a bit more like your nicely automated example (though mine use a different trick to create small quantities of multixacts so they run against unpatched master). I have also been considering a scenario where multixact ID wraparound occurs during basebackup with some ordering that causes trouble, but I don't yet see why it would break if you replay the WAL from the backup label checkpoint (and I think the repro would take days/weeks to run...) -- Thomas Munro http://www.enterprisedb.com
Attachment:
copy-after-truncation.sh
Description: Bourne shell script
Attachment:
copy-before-truncation.sh
Description: Bourne shell script
-- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general