Re: [PATCH RESEND v2 01/18] xfs: Fix multi-transaction larp replay

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2022-08-10 at 08:52 -0700, Darrick J. Wong wrote:
> On Wed, Aug 10, 2022 at 04:12:58PM +1000, Dave Chinner wrote:
> > On Tue, Aug 09, 2022 at 10:01:49PM -0700, Alli wrote:
> > > On Wed, 2022-08-10 at 11:58 +1000, Dave Chinner wrote:
> > > > On Tue, Aug 09, 2022 at 09:52:55AM -0700, Darrick J. Wong
> > > > wrote:
> > > > > On Thu, Aug 04, 2022 at 12:39:56PM -0700, Allison Henderson
> > > > > wrote:
> > > > > > Recent parent pointer testing has exposed a bug in the
> > > > > > underlying
> > > > > > attr replay.  A multi transaction replay currently performs
> > > > > > a
> > > > > > single step of the replay, then deferrs the rest if there
> > > > > > is more
> > > > > > to do.
> > > > 
> > > > Yup.
> > > > 
> > > > > > This causes race conditions with other attr replays that
> > > > > > might be recovered before the remaining deferred work has
> > > > > > had a
> > > > > > chance to finish.
> > > > 
> > > > What other attr replays are we racing against?  There can only
> > > > be
> > > > one incomplete attr item intent/done chain per inode present in
> > > > log
> > > > recovery, right?
> > > No, a rename queues up a set and remove before committing the
> > > transaction.  One for the new parent pointer, and another to
> > > remove the
> > > old one.
> > 
> > Ah. That really needs to be described in the commit message -
> > changing from "single intent chain per object" to "multiple
> > concurrent independent and unserialised intent chains per object"
> > is
> > a pretty important design rule change...
> > 
> > The whole point of intents is to allow complex, multi-stage
> > operations on a single object to be sequenced in a tightly
> > controlled manner. They weren't intended to be run as concurrent
> > lines of modification on single items; if you need to do two
> > modifications on an object, the intent chain ties the two
> > modifications together into a single whole.
> 
> Back when I made the suggestion that resulted in this patch, I was
> pondering why it is that (say) atomic swapext didn't suffer from
> these
> recovery problems, and I realized that for any given inode, you can
> only
> have one ongoing swapext operation at a time.  That's why recovery of
> swapext operations works fine, whereas pptr recovery has this quirk.
> 
> At the time, my thought process was more narrowly focused on making
> log
> recovery mimic runtime more closely.  I didn't make the connection
> between this problem and the other open question I had (see the
> bottom)
> about how to fix pptr attrs when rebuilding a directory.
> 
> > One of the reasons I rewrote the attr state machine for LARP was to
> > enable new multiple attr operation chains to be easily build from
> > the entry points the state machien provides. Parent attr rename
> > needs a new intent chain to be built, not run multiple independent
> > intent chains for each modification.
> > 
> > > It cant be an attr replace because technically the names are
> > > different.
> > 
> > I disagree - we have all the pieces we need in the state machine
> > already, we just need to define separate attr names for the
> > remove and insert steps in the attr intent.
> > 
> > That is, the "replace" operation we execute when an attr set
> > overwrites the value is "technically" a "replace value" operation,
> > but we actually implement it as a "replace entire attribute"
> > operation.
> 
> OH.  Right.  I forgot that ATTR_REPLACE=="replace entire attr".
> 
> If I'm understanding this right, that means that the xfs_rename patch
> ought to detect the situation where there's an existing dirent in the
> target directory, and do something along the lines of:
> 
> 	} else { /* target_ip != NULL */
> 		xfs_dir_replace(...);
> 
> 		xfs_parent_defer_replace(tp, new_parent_ptr, target_dp,
> 				old_diroffset, target_name,
> 				new_diroffset);
> 
> 		xfs_trans_ichgtime(...);
> 
> Where the xfs_parent_defer_replace operation does an ATTR_REPLACE to
> switch:
> 
> (target_dp_ino, target_gen, old_diroffset) == <dontcare>
> 
> to this:
> 
> (target_dp_ino, target_gen, new_diroffset) == target_name
> 
> except, I think we have to log the old name in addition to the new
> name,
> because userspace ATTR_REPLACE operations don't allow name changes?
> 
> I guess this also implies that xfs_dir_replace will pass out the
> offset
> of the old name, in addition to the offset of the new name.
> 
> > Without LARP, we do that overwrite in independent steps via an
> > intermediate INCOMPLETE state to allow two xattrs of the same name
> > to exist in the attr tree at the same time. IOWs, the attr value
> > overwrite is effectively a "set-swap-remove" operation on two
> > entirely independent xattrs, ensuring that if we crash we always
> > have either the old or new xattr visible.
> > 
> > With LARP, we can remove the original attr first, thereby avoiding
> > the need for two versions of the xattr to exist in the tree in the
> > first place. However, we have to do these two operations as a pair
> > of linked independent operations. The intent chain provides the
> > linking, and requires us to log the name and the value of the attr
> > that we are overwriting in the intent. Hence we can always recover
> > the modification to completion no matter where in the operation we
> > fail.
> > 
> > When it comes to a parent attr rename operation, we are effectively
> > doing two linked operations - remove the old attr, set the new attr
> > - on different attributes. Implementation wise, it is exactly the
> > same sequence as a "replace value" operation, except for the fact
> > that the new attr we add has a different name.
> > 
> > Hence the only real difference between the existing "attr replace"
> > and the intent chain we need for "parent attr rename" is that we
> > have to log two attr names instead of one. Basically, we have a new
> > XFS_ATTRI_OP_FLAGS... type for this, and that's what tells us that
> > we are operating on two different attributes instead of just one.
> 
> This answers my earlier question: Yes, and yes.

I see, alrighty then, I'll see if I can put together a new
XFS_ATTRI_OP_FLAGS type that carries both the old and new name.  That
sounds like it should work.  Thanks for all the feed back!

Allison


> 
> > The recovery operation becomes slightly different - we have to run
> > a
> > remove on the old, then a replace on the new - so there a little
> > bit
> > of new code needed to manage that in the state machine.
> > 
> > These, however, are just small tweaks on the existing replace attr
> > operation, and there should be little difference in performance or
> > overhead between a "replace value" and a "replace entire xattr"
> > operation as they are largely the same runtime operation for LARP.
> > 
> > > So the recovered set grows the leaf, and returns the egain, then
> > > rest
> > > gets capture committed.  Next up is the recovered remove which
> > > pulls
> > > out the fork, which causes problems when the rest of the set
> > > operation
> > > resumes as a deferred operation.
> > 
> > Yup, and all this goes away when we build the right intent chain
> > for
> > replacing a parent attr rename....
> 
> Funnily enough, just last week I had thought that online repair was
> going to require the ability to replace an entire xattr...
> 
> https://urldefense.com/v3/__https://djwong.org/docs/xfs-online-fsck-design/*parent-pointers__;Iw!!ACWV5N9M2RV99hQ!MA2KfxWZLMTj_fdJoFnvZhLIgOGsGlIclRVE39DFME755VnvyX4VqsQGM6GfBDnDXKkfAcFjdv2oENaXepic$ 
> 
> --D
> 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@xxxxxxxxxxxxx




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux