Re: [PATCH 17/18] xfs: ATTR_REPLACE algorithm with LARP enabled needs rework

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 10, 2022 at 04:53:47PM -0700, Darrick J. Wong wrote:
> On Mon, May 09, 2022 at 10:41:37AM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > 
> > We can't use the same algorithm for replacing an existing attribute
> > when logging attributes. The existing algorithm is essentially:
> > 
> > 1. create new attr w/ INCOMPLETE
> > 2. atomically flip INCOMPLETE flags between old + new attribute
> > 3. remove old attr which is marked w/ INCOMPLETE
> > 
> > This algorithm guarantees that we see either the old or new
> > attribute, and if we fail after the atomic flag flip, we don't have
> > to recover the removal of the old attr because we never see
> > INCOMPLETE attributes in lookups.
> > 
> > For logged attributes, however, this does not work. The logged
> > attribute intents do not track the work that has been done as the
> > transaction rolls, and hence the only recovery mechanism we have is
> > "run the replace operation from scratch".
> > 
> > This is further exacerbated by the attempt to avoid needing the
> > INCOMPLETE flag to create an atomic swap. This means we can create
> > a second active attribute of the same name before we remove the
> > original. If we fail at any point after the create but before the
> > removal has completed, we end up with duplicate attributes in
> > the attr btree and recovery only tries to replace one of them.
> > 
> > There are several other failure modes where we can leave partially
> > allocated remote attributes that expose stale data, partially free
> > remote attributes that enable UAF based stale data exposure, etc.
> > 
> > TO fix this, we need a different algorithm for replace operations
> > when LARP is enabled. Luckily, it's not that complex if we take the
> > right first step. That is, the first thing we log is the attri
> > intent with the new name/value pair and mark the old attr as
> > INCOMPLETE in the same transaction.
> > 
> > From there, we then remove the old attr and keep relogging the
> > new name/value in the intent, such that we always know that we have
> > to create the new attr in recovery. Once the old attr is removed,
> > we then run a normal ATTR_CREATE operation relogging the intent as
> > we go. If the new attr is local, then it gets created in a single
> > atomic transaction that also logs the final intent done. If the new
> > attr is remote, the we set INCOMPLETE on the new attr while we
> > allocate and set the remote value, and then we clear the INCOMPLETE
> > flag at in the last transaction taht logs the final intent done.
> > 
> > If we fail at any point in this algorithm, log recovery will always
> > see the same state on disk: the new name/value in the intent, and
> > either an INCOMPLETE attr or no attr in the attr btree. If we find
> > an INCOMPLETE attr, we run the full replace starting with removing
> > the INCOMPLETE attr. If we don't find it, then we simply create the
> > new attr.
> 
> ...assuming that the INCOMPLETE attr is preserved until we're completely
> done unmapping the rmtval blocks, right?

Yes, it is preserved - the INCOMPLETE flag is held in
the name entry and that's not removed until after the rmtval blocks
have been fully removed.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux