On 2015-06-01 14:22:32 -0400, Robert Haas wrote: > commit d33b4eb0167f465edb00bd6c0e1bcaa67dd69fe9 > Author: Robert Haas <rhaas@xxxxxxxxxxxxxx> > Date: Fri May 29 14:35:53 2015 -0400 > > foo Hehe! > diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c > index 9568ff1..aca829d 100644 > --- a/src/backend/access/transam/multixact.c > +++ b/src/backend/access/transam/multixact.c > @@ -199,8 +199,9 @@ typedef struct MultiXactStateData > MultiXactOffset nextOffset; > > /* > - * Oldest multixact that is still on disk. Anything older than this > - * should not be consulted. These values are updated by vacuum. > + * Oldest multixact that may still be referenced from a relation. > + * Anything older than this should not be consulted. These values are > + * updated by vacuum. > */ > MultiXactId oldestMultiXactId; > Oid oldestMultiXactDB; > @@ -213,6 +214,18 @@ typedef struct MultiXactStateData > */ > MultiXactId lastCheckpointedOldest; > > + /* > + * This is the oldest file that actually exist on the disk. This value > + * is initialized by scanning pg_multixact/offsets, and subsequently > + * updated each time we complete a truncation. We need a flag to > + * indicate whether this has been initialized yet. > + */ > + MultiXactId oldestMultiXactOnDisk; > + bool oldestMultiXactOnDiskValid; > + > + /* Has TrimMultiXact been called yet? */ > + bool didTrimMultiXact; I'm not really convinced tying things closer to having done trimming is easier to understand than tying things to recovery having finished. E.g. if (did_trim) oldestOffset = GetOldestReferencedOffset(oldest_datminmxid); imo is harder to understand than if (!InRecovery). Maybe we could just name it finishedStartup and rename the functions accordingly? > @@ -1956,12 +1971,6 @@ StartupMultiXact(void) > */ > pageno = MXOffsetToMemberPage(offset); > MultiXactMemberCtl->shared->latest_page_number = pageno; > - > - /* > - * compute the oldest member we need to keep around to avoid old member > - * data overrun. > - */ > - DetermineSafeOldestOffset(MultiXactState->oldestMultiXactId); > } Maybe it's worthwhile to add a 'NB: At this stage the data directory is not yet necessarily consistent' StartupMultiXact's comments, to avoid reintroducing problems like this? > /* > + * We can read this without a lock, because it only changes when nothing > + * else is running. > + */ > + did_trim = MultiXactState->didTrimMultiXact; Err, Hot Standby? It might be ok to not lock, but the comment is definitely wrong. I'm inclined to simply use locking, this doesn't look sufficiently critical performancewise. > +static MultiXactOffset > +GetOldestReferencedOffset(MultiXactId oldestMXact) > +{ > + MultiXactId earliest; > + MultiXactOffset oldestOffset; > + > + /* > + * Because of bugs in early 9.3.X and 9.4.X releases (see comments in > + * TrimMultiXact for details), oldest_datminmxid might point to a > + * nonexistent multixact. If so, use the oldest one that actually > + * exists. Anything before this can't be successfully used anyway. > + */ > + earliest = GetOldestMultiXactOnDisk(); > + if (MultiXactIdPrecedes(oldestMXact, earliest)) > + oldestMXact = earliest; Hm. If GetOldestMultiXactOnDisk() gets the starting point by scanning the disk it'll always get one at a segment boundary, right? I'm not sure that's actually ok; because the value at the beginning of the segment can very well end up being a 0, as MaybeExtendOffsetSlru() will have filled the page with zeros. I think that should be harmless, the worst that can happen is that oldestOffset errorneously is 0, which should be correct, even though possibly overly conservative, in these cases. Greetings, Andres Freund -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general