Re: [PATCH 7/9] xfs: Add attr context to log item

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 24, 2019 at 08:25:33AM -0700, Darrick J. Wong wrote:
> On Wed, Apr 24, 2019 at 08:17:48AM -0400, Brian Foster wrote:
> > On Tue, Apr 23, 2019 at 09:10:16PM -0700, Darrick J. Wong wrote:
> > > Sorry I'm late back to the party...
> > > 
> > > On Tue, Apr 23, 2019 at 07:24:40PM -0700, Allison Henderson wrote:
> > > > 
> > > > On 4/23/19 6:20 AM, Brian Foster wrote:
> > > > > On Mon, Apr 22, 2019 at 03:01:27PM -0700, Allison Henderson wrote:
> > > > > > 
> > > > > > 
> > > > > > On 4/22/19 6:03 AM, Brian Foster wrote:
> > > > > > > On Fri, Apr 12, 2019 at 03:50:34PM -0700, Allison Henderson wrote:
> > > > > > > > This patch modifies xfs_attr_item to store a xfs_da_args, a xfs_buf pointer
> > > > > > > > and a new state type. We will use these in the next patch when
> > > > > > > > we modify xfs_set_attr_args to roll transactions by returning EAGAIN.
> > > > > > > > Because the subroutines of this function modify the contents of these
> > > > > > > > structures, we need to find a place to store them where they remain
> > > > > > > > instantiated across multiple calls to xfs_set_attr_args.
> > > > > > > > 
> > > > > > > > Signed-off-by: Allison Henderson <allison.henderson@xxxxxxxxxx>
> > > > > > > > ---
> > > > > > > 
> > > > > > > I see Darrick has already commented on the whole state thing. I'll
> > > > > > > probably have to grok the next patch to comment further, but just a
> > > > > > > couple initial thoughts:
> > > > > > > 
> > > > > > > First, I hit a build failure with this patch. It looks like there's a
> > > > > > > missed include in the scrub code:
> > > > > > > 
> > > > > > >     ...
> > > > > > >     CC [M]  fs/xfs/scrub/repair.o
> > > > > > > In file included from fs/xfs/scrub/repair.c:32:
> > > > > > > fs/xfs/libxfs/xfs_attr.h:105:21: error: field ‘xattri_args’ has incomplete type
> > > > > > >     struct xfs_da_args xattri_args;   /* args context */
> > > > > > Hmm, ok.  I'll get that corrected, I probably need to clean out my workspace
> > > > > > and build from scratch.
> > > > > > 
> > > > > > >     ...
> > > > > > > 
> > > > > > > Second, the commit log suggests that the states will reflect the current
> > > > > > > transaction roll points (i.e., establishing re-entry points down in
> > > > > > > xfs_attr_set_args(). I'm kind of wondering if we should break these
> > > > > > > xattr set sub-sequences down into smaller helper functions (refactoring
> > > > > > > the existing code as we go) such that the mechanism could technically be
> > > 
> > > I had had the thought of "why not just give each step of setting an
> > > attribute its own log item, so we don't have to have this STATE_NNN
> > > business?" but then realized that will generate an insane amount of
> > > boilerplate, and you're already close to a better solution, so I shut up
> > > to think harder. :)
> > > 
> > 
> > The thought of separating things down into smaller "ops" popped into my
> > head (not necessarily separate/smaller log items), but I hadn't really
> > thought it through to this point...
> > 
> > > > > > > used deferred or not. Re: the previous thought on whether to defer xattr
> > > > > > > removes or not, there might also be cases where there's not a need to
> > > > > > > defer xattr sets.
> > > > > > > 
> > > > > > > E.g., taking a quick peek into the next patch, the state 1 case in
> > > > > > > xfs_attr_try_sf_addname() is actually a transaction commit, which I
> > > > > > > think means we're done. We'd have done an attr memory allocation,
> > > > > > > deferred op and transaction roll where none was necessary so it might
> > > > > > > not be worth it to defer in that scenario. Hmm, it also looks like we
> > > > > > > return -EAGAIN in places where we've not actually done any work, like if
> > > > > > > a shortform add attempt returns -ENOSPC (or the -EAGAIN return before we
> > > > > > > even attempt the sf add). That kind of looks like a waste of transaction
> > > > > > > rolls and further suggests it might be cleaner to break this whole path
> > > > > > > down into helpers and put it back together in a way more conducive to
> > > > > > > deferred operations.
> > > 
> > > Er, agreed:
> > > 
> > > > > > Yes, this area is a bit of a wart the way it is right now.  I think you're
> > > > > > right in that ultimately we may end up having to do a lot of refactoring in
> > > > > > order to have more efficient "re-entry points".  The state machine is hard
> > > > > > to get into subroutines, so it's limited in use in the top level function.
> > > 
> > > So my current understanding of the problem is that we have this big old
> > > xfs_attr_set_args function that does multiple responsibilities requiring
> > > transaction rolls, which we can't do directly inside a ->finish_item
> > > handler:
> > > 
> > >  1. If no attr fork, add one.
> > >  2. If shortform attr fork, try to put it in the sf area.
> > >  3. If shortform attr fork and out of space, convert to leaf format.
> > >  4. Add attr to leaf/node attr tree.
> > > 
> > 
> > And there are a bunch of tx rolls down in the #4 codepath that this
> > series currently just tosses away. I'm not quite sure how appropriate
> > that is, but I also don't think we necessarily need to preserve each and
> > every transaction roll as implemented by the current code.
> > 
> > IOW, I think it absolutely makes sense to step back from the current
> > behavior and reassess the best/required places to roll xattr ops in
> > progress as well as the transaction reservation itself.
> 
> Yes, it would help to make a list of every small step that could
> possibly be required to set an attribute.  That will help narrow down
> how many defer op pieces are needed.
> 
> Another thought I had is that having the finish_item continually logging
> a new intent with the latest state means that we can free the old intent
> item, which helps us avoid the problem of pinning the log tail at that
> first intent item while we scramble around doing a whole lot of rolling
> and other work to get to the done item.
> 
> > > So how about this: refactor each of these pieces into a separate
> > > function, then add a separate XFS_ATTR_OP_FLAGS_* value for each of
> > > these little pieces.  xfs_trans_attr() can call the appropriate little
> > > function for the OP_FLAG and xfs_attr_finish_item can figure out which
> > > state comes next based on the return value.
> > > 
> > > By directly mapping distinct OP_FLAGS to each piece of the attr setting
> > > puzzle, you can use the existing "roll and come back" part of the defer
> > > ops machinery.
> > > 
> > > If _finish_item thinks we're done then we just exit.  Otherwise, store
> > > the new state in the (struct xfs_attr_item *) parameter passed into
> > > _finish_item and return -EAGAIN, which puts the defer item back on the
> > > defer op list, logs a new xattr intent with the new state, rolls the
> > > transaction, and tries to finish the attr again.  I think you've already
> > > done this last part.
> > > 
> > 
> > That sounds plausible to me. One concern I have is that I think we
> > should try to avoid creating more unnecessary complexity in the dfops
> > state mechanism simply to accommodate a messy xattr implementation. For
> > example, consider the following sequence for a simple set of an xattr
> > that requires leaf format and remote value block(s):
> > 
> > - try sf add
> > - returns -ENOSPC, convert to leaf and roll tx
> > - attempt to add the xattr (xfs_attr_leaf_addname())
> > 	- if -ENOSPC, convert to node and call xfs_attr_node_addname()
> > 	- else call xfs_attr3_leaf_add_work()
> > 		- add entry
> > 		- if remoteval, set INCOMPLETE
> > - roll tx
> > - if remoteval, call xfs_attr_rmtval_set()
> > 	- block allocation, tx roll loop
> > 	- copy remote value into bufs, xfs_bwrite()
> > - if remoteval, xfs_attr3_leaf_clearflag()
> > 	- clear INCOMPLETE
> > 	- update/log rmt pointers
> > 	- roll tx
> > 
> > I'm wondering 1.) how much of this is necessary with an intent based
> > implementation and 2.) how much of this can be refactored to not require
> > complex state tracking.
> > 
> > For example, all of the format conversions that occur before we actually
> > make any modifications associated with the xattr (i.e., -ENOSPC returns
> > from the current format) seem to me could easily be performed and
> > immediately return -EAGAIN without any state tracking. The retry should
> > pick up the current format of the fork and retry there. Thus, ISTM we
> > could drop the whole xfs_attr_leaf_addname() -> xfs_attr3_leaf_to_node()
> > -> xfs_attr_node_addname() codepath in favor of a format conversion and
> > -EAGAIN retry that calls directly into xfs_attr_node_addname().
> 
> That had been my other thought -- in theory we keep the inode locked
> across all the transaction rolls, so we could auto-detect what we need
> to do.
> 

Indeed, at the very least it might reduce the number of "on-disk" state
markers we have to define.

> > Once we have leaf format and we're doing remote block allocation, how
> > much could we get away with by re-looking up the entry, finding that
> > we're still short of remote blocks and performing another
> > xfs_bmapi_write() -> -EAGAIN cycle until we're good to copy in the xattr
> > value?
> > 
> > What about all this INCOMPLETE stuff? Do we even need that with an
> > intent based implementation?
> 
> No.  AFAIK the INCOMPLETE flag exists to hide attrs from userspace until
> we're totally done setting them up, and is therefore unnecessary with an
> intent implementation.  Repair zaps any INCOMPLETE attrs it finds.
> 
> > My understanding was that was because we
> > had to roll the transaction and thus could leave an incomplete xattr on
> > disk. I haven't looked too far into it so perhaps there's more to it
> > than that, but if not and this is no longer a problem with an intent
> > based implementation then perhaps much of that code and associated tx
> > rolls can be bypassed as well.
> 
> Getting rid of the INCOMPLETE wonkiness would be the strongest argument
> for switching the regular attr manipulation paths to use intents, though
> we'd have to toggle it with some feature or other.
> 
> (Some feature or other being parent pointers, or possibly just migrating
> the free space tracking parts of dir3 to a "new" attr4 format for better
> speed.)
> 
> > This is not to say that we won't require any such state tracking as
> > you've described above. The whole block allocation thing above may
> > require a state marker to get around attempts to set the xattr name
> > again and get back to the remote value block allocation code. It also
> > looks like we can do post xattr set format changes (i.e., node -> leaf,
> > leaf -> sf) that might require something like that to make sure we don't
> > go an retry an xattr set we've already completed. The point is just that
> > I'd prefer that we explore how much we can simplify this mess of an
> > implementation as much as possible (the above is all very handwavy)
> > first to reduce the state tracking complexity, particularly if these
> > states end up written to the log via the intent.
> > 
> > Hmm, I'm starting to think that maybe what we really need to do here is
> > step back from the code and logically map out what these states and the
> > resulting operation flow needs to be, particularly since there are so
> > many variations between different format conversions, renames, remote
> > blocks, etc. Once we have this whole mess mapped out, coding it up
> > should be more of an effort in refactoring.
> 
> Yep.
> 
> > > xfs_attri_recover then becomes much simpler -- we're passed in the
> > > reconstructed log item from which we figure out which step we need to
> > > do.  We call xfs_trans_attr() to do that one step, but unlike
> > > _finish_item, we use the new state to construct a *new* attr intent and
> > > attach it to the transaction, then call xfs_defer_move at the end to
> > > move all the queued defer_ops to the parent_tp because log recovery
> > > requires us to recover all the incomplete log intent items before
> > > finishing any new ones that were created as part of recovery.
> > > 
> > > This does mean that we end up with dramatically separate code paths for
> > > defer ops attr setting vs. regular attr setting, but as you point out
> > > the parent pointer feature will give the new code paths plenty of exercise.
> > > Tying the new log intent items to a new feature bit is key to preventing
> > > old kernels from stumbling across our new intent items, so we needed to
> > > preserve the old attr set paths anyway.
> > > 
> > 
> > That's a good point wrt to the other discussion around the direct xattr
> > codepath. It sounds like we do need to keep that entire path around
> > regardless to support v4 filesystems and such. The current series just
> > unconditionally switches things over to deferred ops.
> 
> Er... yikes.  XFS cannot suddenly introduce new ondisk formats for
> existing filesystems.
> 

We were discussing whether to preserve the existing codepath with
respect to flexibility/efficiency, but the whole backwards compatibility
aspect just didn't register with me until you mentioned it. I think that
kind of makes that decision for us. :P

> > > Anyway, if this all seems confusing, you can track me down, because I
> > > wrote most of this system and therefore have forgotten all of
> > > it^W^W^W^W^Wam available to help. :)
> > > 
> > > > > > 
> > > > > > I was also starting to wonder if maybe I could do some refactoring in
> > > > > > xfs_defer_finish_noroll to capture the common code associated with the
> > > > > > -EAGAIN handling.  Then maybe we could make a function pointer that we can
> > > > > > pass through the finish_item interface.  The idea being that subroutines
> > > > > > could use the function pointer to cycle out the transaction when needed
> > > > > > instead of having to record states and back out like this. It'd be a new
> > > 
> > > The state tracking and rolling is already built into xfs_defer.c. :)
> > > 
> > > > > > parameter to pipe around, but it'd be more efficient than the state machine,
> > > > > > and less surgery in the refactor.  And maybe a blessing to any other
> > > > > > operations that might need to go through this transition in the future.
> > > > > > Thoughts?
> > > > > > 
> > > > > 
> > > > > That's an interesting idea. It still strikes me as a bit of a
> > > > > fallback/hack as opposed to organizing the code to properly fit into the
> > > > > dfops infrastructure, but it could be useful as a transient solution.
> > > > >  From a high level, it looks like we'd have to create a new intent, relog
> > > > > this item and all remaining items associated with the dfp to it, roll
> > > > > the tx, and finally create a done item associated with the intent in the
> > > > > new tx. You'd need access to the dfp for some of that, so it's not
> > > > > immediately clear to me that this ends up much easier than fixing up
> > > > > the xattr code.
> > > 
> > > (I think the code that handles EAGAIN being returned from finish_item
> > > does this for you....)
> > > 
> > 
> > Yeah, I'm not totally sure it's an ideal/feasible approach, but for the
> > sake of clarity I think what Allison is getting at is that if there was
> > a way to trigger a dfops -EAGAIN roll sequence via a callback/helper
> > function, we wouldn't need to refactor the xattr subsystem to have
> > -EAGAIN return points. Instead we could just invoke the callback at the
> > existing roll points and achieve the same behavior (in theory). It's
> > kind of like providing an inside-out xfs_defer_finish_noroll() -EAGAIN
> > implementation via a helper function for code down in ->finish_item().
> 
> <nod> I grok that, but wonder if we really can invoke a roll while in
> the middle of ->finish_item...?  Anyway, we can set aside my confusion
> for now because I really think we need to see a map of all the pieces 
> 

Ok, I'm not really sure either. It was just an idea to bat around with
the rest. I agree that an informal, logical map/breakdown is the best
next step here. That gives us something concrete to review and refine.

Brian

> --D
> 
> > Brian
> > 
> > > > > 
> > > > > BTW, if we did end up with something like that I'd probably prefer to
> > > > > see it as an exported dfops helper function as opposed to a function
> > > > > pointer being passed around, if possible.
> > > > > 
> > > > 
> > > > Alrighty, I think for now I may try to pursue something more like what you
> > > > proposed in the next patch and see where I get first.  Maybe I'll come back
> > > > to this later if for some reason it doesn't work out, but I think what you
> > > > have there is reasonable.
> > > 
> > > <nod>
> > > 
> > > --D
> > > 
> > > > 
> > > > Thanks again for the reviews!
> > > > Allison
> > > > 
> > > > > Brian
> > > > > 
> > > > > > Thanks again for the reviews!
> > > > > > 
> > > > > > Allison
> > > > > > 
> > > > > > > 
> > > > > > > Brian
> > > > > > > 
> > > > > > > 
> > > > > > > >    fs/xfs/libxfs/xfs_attr.h | 18 +++++++++++++++++-
> > > > > > > >    fs/xfs/scrub/common.c    |  2 ++
> > > > > > > >    fs/xfs/xfs_acl.c         |  2 ++
> > > > > > > >    fs/xfs/xfs_attr_item.c   |  2 +-
> > > > > > > >    fs/xfs/xfs_ioctl.c       |  2 ++
> > > > > > > >    fs/xfs/xfs_ioctl32.c     |  2 ++
> > > > > > > >    fs/xfs/xfs_iops.c        |  1 +
> > > > > > > >    fs/xfs/xfs_xattr.c       |  1 +
> > > > > > > >    8 files changed, 28 insertions(+), 2 deletions(-)
> > > > > > > > 
> > > > > > > > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > > > > > > > index 974c963..4ce3b0a 100644
> > > > > > > > --- a/fs/xfs/libxfs/xfs_attr.h
> > > > > > > > +++ b/fs/xfs/libxfs/xfs_attr.h
> > > > > > > > @@ -77,6 +77,13 @@ typedef struct attrlist_ent {	/* data from attr_list() */
> > > > > > > >    	char	a_name[1];	/* attr name (NULL terminated) */
> > > > > > > >    } attrlist_ent_t;
> > > > > > > > +/* Attr state machine types */
> > > > > > > > +enum xfs_attr_state {
> > > > > > > > +	XFS_ATTR_STATE1 = 1,
> > > > > > > > +	XFS_ATTR_STATE2 = 2,
> > > > > > > > +	XFS_ATTR_STATE3 = 3,
> > > > > > > > +};
> > > > > > > > +
> > > > > > > >    /*
> > > > > > > >     * List of attrs to commit later.
> > > > > > > >     */
> > > > > > > > @@ -88,7 +95,16 @@ struct xfs_attr_item {
> > > > > > > >    	void		  *xattri_name;	      /* attr name */
> > > > > > > >    	uint32_t	  xattri_name_len;    /* length of name */
> > > > > > > >    	uint32_t	  xattri_flags;       /* attr flags */
> > > > > > > > -	struct list_head  xattri_list;
> > > > > > > > +
> > > > > > > > +	/*
> > > > > > > > +	 * Delayed attr parameters that need to remain instantiated
> > > > > > > > +	 * across transaction rolls during the defer finish
> > > > > > > > +	 */
> > > > > > > > +	struct xfs_buf		*xattri_leaf_bp;  /* Leaf buf to release */
> > > > > > > > +	enum xfs_attr_state	xattri_state;	  /* state machine marker */
> > > > > > > > +	struct xfs_da_args	xattri_args;	  /* args context */
> > > > > > > > +
> > > > > > > > +	struct list_head	xattri_list;
> > > > > > > >    	/*
> > > > > > > >    	 * A byte array follows the header containing the file name and
> > > > > > > > diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
> > > > > > > > index 0c54ff5..270c32e 100644
> > > > > > > > --- a/fs/xfs/scrub/common.c
> > > > > > > > +++ b/fs/xfs/scrub/common.c
> > > > > > > > @@ -30,6 +30,8 @@
> > > > > > > >    #include "xfs_rmap_btree.h"
> > > > > > > >    #include "xfs_log.h"
> > > > > > > >    #include "xfs_trans_priv.h"
> > > > > > > > +#include "xfs_da_format.h"
> > > > > > > > +#include "xfs_da_btree.h"
> > > > > > > >    #include "xfs_attr.h"
> > > > > > > >    #include "xfs_reflink.h"
> > > > > > > >    #include "scrub/xfs_scrub.h"
> > > > > > > > diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
> > > > > > > > index 142de8d..9b1b93e 100644
> > > > > > > > --- a/fs/xfs/xfs_acl.c
> > > > > > > > +++ b/fs/xfs/xfs_acl.c
> > > > > > > > @@ -10,6 +10,8 @@
> > > > > > > >    #include "xfs_mount.h"
> > > > > > > >    #include "xfs_inode.h"
> > > > > > > >    #include "xfs_acl.h"
> > > > > > > > +#include "xfs_da_format.h"
> > > > > > > > +#include "xfs_da_btree.h"
> > > > > > > >    #include "xfs_attr.h"
> > > > > > > >    #include "xfs_trace.h"
> > > > > > > >    #include <linux/slab.h>
> > > > > > > > diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> > > > > > > > index 0ea19b4..36e6d1e 100644
> > > > > > > > --- a/fs/xfs/xfs_attr_item.c
> > > > > > > > +++ b/fs/xfs/xfs_attr_item.c
> > > > > > > > @@ -19,10 +19,10 @@
> > > > > > > >    #include "xfs_rmap.h"
> > > > > > > >    #include "xfs_inode.h"
> > > > > > > >    #include "xfs_icache.h"
> > > > > > > > -#include "xfs_attr.h"
> > > > > > > >    #include "xfs_shared.h"
> > > > > > > >    #include "xfs_da_format.h"
> > > > > > > >    #include "xfs_da_btree.h"
> > > > > > > > +#include "xfs_attr.h"
> > > > > > > >    static inline struct xfs_attri_log_item *ATTRI_ITEM(struct xfs_log_item *lip)
> > > > > > > >    {
> > > > > > > > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > > > > > > > index ab341d6..c8728ca 100644
> > > > > > > > --- a/fs/xfs/xfs_ioctl.c
> > > > > > > > +++ b/fs/xfs/xfs_ioctl.c
> > > > > > > > @@ -16,6 +16,8 @@
> > > > > > > >    #include "xfs_rtalloc.h"
> > > > > > > >    #include "xfs_itable.h"
> > > > > > > >    #include "xfs_error.h"
> > > > > > > > +#include "xfs_da_format.h"
> > > > > > > > +#include "xfs_da_btree.h"
> > > > > > > >    #include "xfs_attr.h"
> > > > > > > >    #include "xfs_bmap.h"
> > > > > > > >    #include "xfs_bmap_util.h"
> > > > > > > > diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
> > > > > > > > index 5001dca..23f6990 100644
> > > > > > > > --- a/fs/xfs/xfs_ioctl32.c
> > > > > > > > +++ b/fs/xfs/xfs_ioctl32.c
> > > > > > > > @@ -21,6 +21,8 @@
> > > > > > > >    #include "xfs_fsops.h"
> > > > > > > >    #include "xfs_alloc.h"
> > > > > > > >    #include "xfs_rtalloc.h"
> > > > > > > > +#include "xfs_da_format.h"
> > > > > > > > +#include "xfs_da_btree.h"
> > > > > > > >    #include "xfs_attr.h"
> > > > > > > >    #include "xfs_ioctl.h"
> > > > > > > >    #include "xfs_ioctl32.h"
> > > > > > > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > > > > > > > index e73c21a..561c467 100644
> > > > > > > > --- a/fs/xfs/xfs_iops.c
> > > > > > > > +++ b/fs/xfs/xfs_iops.c
> > > > > > > > @@ -17,6 +17,7 @@
> > > > > > > >    #include "xfs_acl.h"
> > > > > > > >    #include "xfs_quota.h"
> > > > > > > >    #include "xfs_error.h"
> > > > > > > > +#include "xfs_da_btree.h"
> > > > > > > >    #include "xfs_attr.h"
> > > > > > > >    #include "xfs_trans.h"
> > > > > > > >    #include "xfs_trace.h"
> > > > > > > > diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
> > > > > > > > index 3013746..938e81d 100644
> > > > > > > > --- a/fs/xfs/xfs_xattr.c
> > > > > > > > +++ b/fs/xfs/xfs_xattr.c
> > > > > > > > @@ -11,6 +11,7 @@
> > > > > > > >    #include "xfs_mount.h"
> > > > > > > >    #include "xfs_da_format.h"
> > > > > > > >    #include "xfs_inode.h"
> > > > > > > > +#include "xfs_da_btree.h"
> > > > > > > >    #include "xfs_attr.h"
> > > > > > > >    #include "xfs_attr_leaf.h"
> > > > > > > >    #include "xfs_acl.h"
> > > > > > > > -- 
> > > > > > > > 2.7.4
> > > > > > > > 



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux