On Wed, Jun 3, 2015 at 4:48 AM, Thomas Munro <thomas.munro@xxxxxxxxxxxxxxxx> wrote: > On Wed, Jun 3, 2015 at 3:42 PM, Alvaro Herrera <alvherre@xxxxxxxxxxxxxxx> wrote: >> Thomas Munro wrote: >>> On Tue, Jun 2, 2015 at 9:30 AM, Alvaro Herrera <alvherre@xxxxxxxxxxxxxxx> wrote: >>> > My guess is that the file existed, and perhaps had one or more pages, >>> > but the wanted page doesn't exist, so we tried to read but got 0 bytes >>> > back. read() returns 0 in this case but doesn't set errno. >>> > >>> > I didn't find a way to set things so that the file exists but is of >>> > shorter contents than oldestMulti by the time the checkpoint record is >>> > replayed. >>> >>> I'm just starting to learn about the recovery machinery, so forgive me >>> if I'm missing something basic here, but I just don't get this. As I >>> understand it, offsets/0046 should either have been copied with that >>> page present in it if it existed before the backup started (apparently >>> not in this case), or extended to contain it by WAL records that come >>> after the backup label but before the checkpoint record that >>> references it (also apparently not in this case). >> >> Exactly --- that's the spot at which I am, also. I have had this >> spinning in my head for three days now, and tried every single variation >> that I could think of, but like you I was unable to reproduce the issue. >> However, our customer took a second base backup and it failed in exactly >> the same way, module some changes to the counters (the file that >> didn't exist was 004B rather than 0046). I'm still at a loss at what >> the failure mode is. We must be missing some crucial detail ... > > I have finally reproduced that error! See attached repro shell script. > > The conditions are: > > 1. next multixact == oldest multixact (no active multixacts, pointing > past the end) > 2. next multixact would be the first item on a new page (multixact % 2048 == 0) > 3. the page must not be the first in a segment (or we'd get the > read-zeroes case) > > That gives you odds of 1/2048 * 31/32 * (probability of a wraparound > vacuum followed by no multixact creations right before your backup > checkpoint). That seems like reasonably low odds... if it happened > twice in a row, maybe I'm missing something here and there is some > other way to get this... > > I realise now that this is actually a symptom of a problem spotted by > Noah recently: > > http://www.postgresql.org/message-id/20150601045534.GB23587@xxxxxxxxxxxxxxxxxxxx > > He noticed the problem for segment boundaries, when not in recovery. > In recovery, segment boundaries don't raise an error (the read-zeroes > case applies), but page boundaries do. The fix is probably to do > nothing if they are the same, as we do elsewhere, like in the attached > patch. Actually, we still need to call DetermineSafeOldestOffset in that case. Otherwise, if someone goes from lots of multixacts to none, the stop point won't advance. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general