Re: [PATCH 17/17] xfs: ATTR_REPLACE algorithm with LARP enabled needs rework

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 06, 2022 at 11:01:23PM -0700, Alli wrote:
> On Fri, 2022-05-06 at 19:45 +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > 
> > We can't use the same algorithm for replacing an existing attribute
> > when logging attributes. The existing algorithm is essentially:
> > 
> > 1. create new attr w/ INCOMPLETE
> > 2. atomically flip INCOMPLETE flags between old + new attribute
> > 3. remove old attr which is marked w/ INCOMPLETE
> > 
> > This algorithm guarantees that we see either the old or new
> > attribute, and if we fail after the atomic flag flip, we don't have
> > to recover the removal of the old attr because we never see
> > INCOMPLETE attributes in lookups.
> > 
> > For logged attributes, however, this does not work. The logged
> > attribute intents do not track the work that has been done as the
> > transaction rolls, and hence the only recovery mechanism we have is
> > "run the replace operation from scratch".
> > 
> > This is further exacerbated by the attempt to avoid needing the
> > INCOMPLETE flag to create an atomic swap. This means we can create
> > a second active attribute of the same name before we remove the
> > original. If we fail at any point after the create but before the
> > removal has completed, we end up with duplicate attributes in
> > the attr btree and recovery only tries to replace one of them.
> > 
> > There are several other failure modes where we can leave partially
> > allocated remote attributes that expose stale data, partially free
> > remote attributes that enable UAF based stale data exposure, etc.
> > 
> > TO fix this, we need a different algorithm for replace operations
> > when LARP is enabled. Luckily, it's not that complex if we take the
> > right first step. That is, the first thing we log is the attri
> > intent with the new name/value pair and mark the old attr as
> > INCOMPLETE in the same transaction.
> > 
> > From there, we then remove the old attr and keep relogging the
> > new name/value in the intent, such that we always know that we have
> > to create the new attr in recovery. Once the old attr is removed,
> > we then run a normal ATTR_CREATE operation relogging the intent as
> > we go. If the new attr is local, then it gets created in a single
> > atomic transaction that also logs the final intent done. If the new
> > attr is remote, the we set INCOMPLETE on the new attr while we
> > allocate and set the remote value, and then we clear the INCOMPLETE
> > flag at in the last transaction taht logs the final intent done.
> > 
> > If we fail at any point in this algorithm, log recovery will always
> > see the same state on disk: the new name/value in the intent, and
> > either an INCOMPLETE attr or no attr in the attr btree. If we find
> > an INCOMPLETE attr, we run the full replace starting with removing
> > the INCOMPLETE attr. If we don't find it, then we simply create the
> > new attr.
> > 
> > Notably, recovery of a failed create that has an INCOMPLETE flag set
> > is now the same - we start with the lookup of the INCOMPLETE attr,
> > and if that exists then we do the full replace recovery process,
> > otherwise we just create the new attr.
> > 
> > Hence changing the way we do the replace operation when LARP is
> > enabled allows us to use the same log recovery algorithm for both
> > the ATTR_CREATE and ATTR_REPLACE operations. This is also the same
> > algorithm we use for runtime ATTR_REPLACE operations (except for the
> > step setting up the initial conditions).
> > 
> > The result is that:
> > 
> > - ATTR_CREATE uses the same algorithm regardless of whether LARP is
> >   enabled or not
> > - ATTR_REPLACE with larp=0 is identical to the old algorithm
> > - ATTR_REPLACE with larp=1 runs an unmodified attr removal algorithm
> >   from the larp=0 code and then runs the unmodified ATTR_CREATE
> >   code.
> > - log recovery when larp=1 runs the same ATTR_REPLACE algorithm as
> >   it uses at runtime.
> > 
> > Because the state machine is now quite clean, changing the algorithm
> > is really just a case of changing the initial state and how the
> > states link together for the ATTR_REPLACE case. Hence it's not a
> > huge amoutn of code for what is a fairly substantial rework
> > of the attr logging and recovery algorithm....
> > 
> > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> This looks mostly good, though when I run it with the new tests, I seem
> to run into a failed filesystem check at the end.  "bad attribute count
> 0 in attr block 0".  I suspect we still dont have the removal of the
> fork quite right.  It sounds like you're still working with things
> though, I'll keep looking too.

Yeah, that's a strange one. I didn't get it with the based branch
testing (based on 5.18-rc+for-next) but over the weekend where it
got merged with 5.18-rc5 it appears that the error has manifested
in several test runs. I'll dig into it this morning.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux