On Wed, Apr 24, 2019 at 08:25:33AM -0700, Darrick J. Wong wrote: > On Wed, Apr 24, 2019 at 08:17:48AM -0400, Brian Foster wrote: > > On Tue, Apr 23, 2019 at 09:10:16PM -0700, Darrick J. Wong wrote: > > > Sorry I'm late back to the party... > > > > > > On Tue, Apr 23, 2019 at 07:24:40PM -0700, Allison Henderson wrote: > > > > > > > > On 4/23/19 6:20 AM, Brian Foster wrote: > > > > > On Mon, Apr 22, 2019 at 03:01:27PM -0700, Allison Henderson wrote: > > > > > > > > > > > > > > > > > > On 4/22/19 6:03 AM, Brian Foster wrote: > > > > > > > On Fri, Apr 12, 2019 at 03:50:34PM -0700, Allison Henderson wrote: > > > > > > > > This patch modifies xfs_attr_item to store a xfs_da_args, a xfs_buf pointer > > > > > > > > and a new state type. We will use these in the next patch when > > > > > > > > we modify xfs_set_attr_args to roll transactions by returning EAGAIN. > > > > > > > > Because the subroutines of this function modify the contents of these > > > > > > > > structures, we need to find a place to store them where they remain > > > > > > > > instantiated across multiple calls to xfs_set_attr_args. > > > > > > > > > > > > > > > > Signed-off-by: Allison Henderson <allison.henderson@xxxxxxxxxx> > > > > > > > > --- > > > > > > > > > > > > > > I see Darrick has already commented on the whole state thing. I'll > > > > > > > probably have to grok the next patch to comment further, but just a > > > > > > > couple initial thoughts: > > > > > > > > > > > > > > First, I hit a build failure with this patch. It looks like there's a > > > > > > > missed include in the scrub code: > > > > > > > > > > > > > > ... > > > > > > > CC [M] fs/xfs/scrub/repair.o > > > > > > > In file included from fs/xfs/scrub/repair.c:32: > > > > > > > fs/xfs/libxfs/xfs_attr.h:105:21: error: field ‘xattri_args’ has incomplete type > > > > > > > struct xfs_da_args xattri_args; /* args context */ > > > > > > Hmm, ok. I'll get that corrected, I probably need to clean out my workspace > > > > > > and build from scratch. > > > > > > > > > > > > > ... > > > > > > > > > > > > > > Second, the commit log suggests that the states will reflect the current > > > > > > > transaction roll points (i.e., establishing re-entry points down in > > > > > > > xfs_attr_set_args(). I'm kind of wondering if we should break these > > > > > > > xattr set sub-sequences down into smaller helper functions (refactoring > > > > > > > the existing code as we go) such that the mechanism could technically be > > > > > > I had had the thought of "why not just give each step of setting an > > > attribute its own log item, so we don't have to have this STATE_NNN > > > business?" but then realized that will generate an insane amount of > > > boilerplate, and you're already close to a better solution, so I shut up > > > to think harder. :) > > > > > > > The thought of separating things down into smaller "ops" popped into my > > head (not necessarily separate/smaller log items), but I hadn't really > > thought it through to this point... > > > > > > > > > used deferred or not. Re: the previous thought on whether to defer xattr > > > > > > > removes or not, there might also be cases where there's not a need to > > > > > > > defer xattr sets. > > > > > > > > > > > > > > E.g., taking a quick peek into the next patch, the state 1 case in > > > > > > > xfs_attr_try_sf_addname() is actually a transaction commit, which I > > > > > > > think means we're done. We'd have done an attr memory allocation, > > > > > > > deferred op and transaction roll where none was necessary so it might > > > > > > > not be worth it to defer in that scenario. Hmm, it also looks like we > > > > > > > return -EAGAIN in places where we've not actually done any work, like if > > > > > > > a shortform add attempt returns -ENOSPC (or the -EAGAIN return before we > > > > > > > even attempt the sf add). That kind of looks like a waste of transaction > > > > > > > rolls and further suggests it might be cleaner to break this whole path > > > > > > > down into helpers and put it back together in a way more conducive to > > > > > > > deferred operations. > > > > > > Er, agreed: > > > > > > > > > Yes, this area is a bit of a wart the way it is right now. I think you're > > > > > > right in that ultimately we may end up having to do a lot of refactoring in > > > > > > order to have more efficient "re-entry points". The state machine is hard > > > > > > to get into subroutines, so it's limited in use in the top level function. > > > > > > So my current understanding of the problem is that we have this big old > > > xfs_attr_set_args function that does multiple responsibilities requiring > > > transaction rolls, which we can't do directly inside a ->finish_item > > > handler: > > > > > > 1. If no attr fork, add one. > > > 2. If shortform attr fork, try to put it in the sf area. > > > 3. If shortform attr fork and out of space, convert to leaf format. > > > 4. Add attr to leaf/node attr tree. > > > > > > > And there are a bunch of tx rolls down in the #4 codepath that this > > series currently just tosses away. I'm not quite sure how appropriate > > that is, but I also don't think we necessarily need to preserve each and > > every transaction roll as implemented by the current code. > > > > IOW, I think it absolutely makes sense to step back from the current > > behavior and reassess the best/required places to roll xattr ops in > > progress as well as the transaction reservation itself. > > Yes, it would help to make a list of every small step that could > possibly be required to set an attribute. That will help narrow down > how many defer op pieces are needed. > > Another thought I had is that having the finish_item continually logging > a new intent with the latest state means that we can free the old intent > item, which helps us avoid the problem of pinning the log tail at that > first intent item while we scramble around doing a whole lot of rolling > and other work to get to the done item. > > > > So how about this: refactor each of these pieces into a separate > > > function, then add a separate XFS_ATTR_OP_FLAGS_* value for each of > > > these little pieces. xfs_trans_attr() can call the appropriate little > > > function for the OP_FLAG and xfs_attr_finish_item can figure out which > > > state comes next based on the return value. > > > > > > By directly mapping distinct OP_FLAGS to each piece of the attr setting > > > puzzle, you can use the existing "roll and come back" part of the defer > > > ops machinery. > > > > > > If _finish_item thinks we're done then we just exit. Otherwise, store > > > the new state in the (struct xfs_attr_item *) parameter passed into > > > _finish_item and return -EAGAIN, which puts the defer item back on the > > > defer op list, logs a new xattr intent with the new state, rolls the > > > transaction, and tries to finish the attr again. I think you've already > > > done this last part. > > > > > > > That sounds plausible to me. One concern I have is that I think we > > should try to avoid creating more unnecessary complexity in the dfops > > state mechanism simply to accommodate a messy xattr implementation. For > > example, consider the following sequence for a simple set of an xattr > > that requires leaf format and remote value block(s): > > > > - try sf add > > - returns -ENOSPC, convert to leaf and roll tx > > - attempt to add the xattr (xfs_attr_leaf_addname()) > > - if -ENOSPC, convert to node and call xfs_attr_node_addname() > > - else call xfs_attr3_leaf_add_work() > > - add entry > > - if remoteval, set INCOMPLETE > > - roll tx > > - if remoteval, call xfs_attr_rmtval_set() > > - block allocation, tx roll loop > > - copy remote value into bufs, xfs_bwrite() > > - if remoteval, xfs_attr3_leaf_clearflag() > > - clear INCOMPLETE > > - update/log rmt pointers > > - roll tx > > > > I'm wondering 1.) how much of this is necessary with an intent based > > implementation and 2.) how much of this can be refactored to not require > > complex state tracking. > > > > For example, all of the format conversions that occur before we actually > > make any modifications associated with the xattr (i.e., -ENOSPC returns > > from the current format) seem to me could easily be performed and > > immediately return -EAGAIN without any state tracking. The retry should > > pick up the current format of the fork and retry there. Thus, ISTM we > > could drop the whole xfs_attr_leaf_addname() -> xfs_attr3_leaf_to_node() > > -> xfs_attr_node_addname() codepath in favor of a format conversion and > > -EAGAIN retry that calls directly into xfs_attr_node_addname(). > > That had been my other thought -- in theory we keep the inode locked > across all the transaction rolls, so we could auto-detect what we need > to do. > Indeed, at the very least it might reduce the number of "on-disk" state markers we have to define. > > Once we have leaf format and we're doing remote block allocation, how > > much could we get away with by re-looking up the entry, finding that > > we're still short of remote blocks and performing another > > xfs_bmapi_write() -> -EAGAIN cycle until we're good to copy in the xattr > > value? > > > > What about all this INCOMPLETE stuff? Do we even need that with an > > intent based implementation? > > No. AFAIK the INCOMPLETE flag exists to hide attrs from userspace until > we're totally done setting them up, and is therefore unnecessary with an > intent implementation. Repair zaps any INCOMPLETE attrs it finds. > > > My understanding was that was because we > > had to roll the transaction and thus could leave an incomplete xattr on > > disk. I haven't looked too far into it so perhaps there's more to it > > than that, but if not and this is no longer a problem with an intent > > based implementation then perhaps much of that code and associated tx > > rolls can be bypassed as well. > > Getting rid of the INCOMPLETE wonkiness would be the strongest argument > for switching the regular attr manipulation paths to use intents, though > we'd have to toggle it with some feature or other. > > (Some feature or other being parent pointers, or possibly just migrating > the free space tracking parts of dir3 to a "new" attr4 format for better > speed.) > > > This is not to say that we won't require any such state tracking as > > you've described above. The whole block allocation thing above may > > require a state marker to get around attempts to set the xattr name > > again and get back to the remote value block allocation code. It also > > looks like we can do post xattr set format changes (i.e., node -> leaf, > > leaf -> sf) that might require something like that to make sure we don't > > go an retry an xattr set we've already completed. The point is just that > > I'd prefer that we explore how much we can simplify this mess of an > > implementation as much as possible (the above is all very handwavy) > > first to reduce the state tracking complexity, particularly if these > > states end up written to the log via the intent. > > > > Hmm, I'm starting to think that maybe what we really need to do here is > > step back from the code and logically map out what these states and the > > resulting operation flow needs to be, particularly since there are so > > many variations between different format conversions, renames, remote > > blocks, etc. Once we have this whole mess mapped out, coding it up > > should be more of an effort in refactoring. > > Yep. > > > > xfs_attri_recover then becomes much simpler -- we're passed in the > > > reconstructed log item from which we figure out which step we need to > > > do. We call xfs_trans_attr() to do that one step, but unlike > > > _finish_item, we use the new state to construct a *new* attr intent and > > > attach it to the transaction, then call xfs_defer_move at the end to > > > move all the queued defer_ops to the parent_tp because log recovery > > > requires us to recover all the incomplete log intent items before > > > finishing any new ones that were created as part of recovery. > > > > > > This does mean that we end up with dramatically separate code paths for > > > defer ops attr setting vs. regular attr setting, but as you point out > > > the parent pointer feature will give the new code paths plenty of exercise. > > > Tying the new log intent items to a new feature bit is key to preventing > > > old kernels from stumbling across our new intent items, so we needed to > > > preserve the old attr set paths anyway. > > > > > > > That's a good point wrt to the other discussion around the direct xattr > > codepath. It sounds like we do need to keep that entire path around > > regardless to support v4 filesystems and such. The current series just > > unconditionally switches things over to deferred ops. > > Er... yikes. XFS cannot suddenly introduce new ondisk formats for > existing filesystems. > We were discussing whether to preserve the existing codepath with respect to flexibility/efficiency, but the whole backwards compatibility aspect just didn't register with me until you mentioned it. I think that kind of makes that decision for us. :P > > > Anyway, if this all seems confusing, you can track me down, because I > > > wrote most of this system and therefore have forgotten all of > > > it^W^W^W^W^Wam available to help. :) > > > > > > > > > > > > > > > I was also starting to wonder if maybe I could do some refactoring in > > > > > > xfs_defer_finish_noroll to capture the common code associated with the > > > > > > -EAGAIN handling. Then maybe we could make a function pointer that we can > > > > > > pass through the finish_item interface. The idea being that subroutines > > > > > > could use the function pointer to cycle out the transaction when needed > > > > > > instead of having to record states and back out like this. It'd be a new > > > > > > The state tracking and rolling is already built into xfs_defer.c. :) > > > > > > > > > parameter to pipe around, but it'd be more efficient than the state machine, > > > > > > and less surgery in the refactor. And maybe a blessing to any other > > > > > > operations that might need to go through this transition in the future. > > > > > > Thoughts? > > > > > > > > > > > > > > > > That's an interesting idea. It still strikes me as a bit of a > > > > > fallback/hack as opposed to organizing the code to properly fit into the > > > > > dfops infrastructure, but it could be useful as a transient solution. > > > > > From a high level, it looks like we'd have to create a new intent, relog > > > > > this item and all remaining items associated with the dfp to it, roll > > > > > the tx, and finally create a done item associated with the intent in the > > > > > new tx. You'd need access to the dfp for some of that, so it's not > > > > > immediately clear to me that this ends up much easier than fixing up > > > > > the xattr code. > > > > > > (I think the code that handles EAGAIN being returned from finish_item > > > does this for you....) > > > > > > > Yeah, I'm not totally sure it's an ideal/feasible approach, but for the > > sake of clarity I think what Allison is getting at is that if there was > > a way to trigger a dfops -EAGAIN roll sequence via a callback/helper > > function, we wouldn't need to refactor the xattr subsystem to have > > -EAGAIN return points. Instead we could just invoke the callback at the > > existing roll points and achieve the same behavior (in theory). It's > > kind of like providing an inside-out xfs_defer_finish_noroll() -EAGAIN > > implementation via a helper function for code down in ->finish_item(). > > <nod> I grok that, but wonder if we really can invoke a roll while in > the middle of ->finish_item...? Anyway, we can set aside my confusion > for now because I really think we need to see a map of all the pieces > Ok, I'm not really sure either. It was just an idea to bat around with the rest. I agree that an informal, logical map/breakdown is the best next step here. That gives us something concrete to review and refine. Brian > --D > > > Brian > > > > > > > > > > > > BTW, if we did end up with something like that I'd probably prefer to > > > > > see it as an exported dfops helper function as opposed to a function > > > > > pointer being passed around, if possible. > > > > > > > > > > > > > Alrighty, I think for now I may try to pursue something more like what you > > > > proposed in the next patch and see where I get first. Maybe I'll come back > > > > to this later if for some reason it doesn't work out, but I think what you > > > > have there is reasonable. > > > > > > <nod> > > > > > > --D > > > > > > > > > > > Thanks again for the reviews! > > > > Allison > > > > > > > > > Brian > > > > > > > > > > > Thanks again for the reviews! > > > > > > > > > > > > Allison > > > > > > > > > > > > > > > > > > > > Brian > > > > > > > > > > > > > > > > > > > > > > fs/xfs/libxfs/xfs_attr.h | 18 +++++++++++++++++- > > > > > > > > fs/xfs/scrub/common.c | 2 ++ > > > > > > > > fs/xfs/xfs_acl.c | 2 ++ > > > > > > > > fs/xfs/xfs_attr_item.c | 2 +- > > > > > > > > fs/xfs/xfs_ioctl.c | 2 ++ > > > > > > > > fs/xfs/xfs_ioctl32.c | 2 ++ > > > > > > > > fs/xfs/xfs_iops.c | 1 + > > > > > > > > fs/xfs/xfs_xattr.c | 1 + > > > > > > > > 8 files changed, 28 insertions(+), 2 deletions(-) > > > > > > > > > > > > > > > > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h > > > > > > > > index 974c963..4ce3b0a 100644 > > > > > > > > --- a/fs/xfs/libxfs/xfs_attr.h > > > > > > > > +++ b/fs/xfs/libxfs/xfs_attr.h > > > > > > > > @@ -77,6 +77,13 @@ typedef struct attrlist_ent { /* data from attr_list() */ > > > > > > > > char a_name[1]; /* attr name (NULL terminated) */ > > > > > > > > } attrlist_ent_t; > > > > > > > > +/* Attr state machine types */ > > > > > > > > +enum xfs_attr_state { > > > > > > > > + XFS_ATTR_STATE1 = 1, > > > > > > > > + XFS_ATTR_STATE2 = 2, > > > > > > > > + XFS_ATTR_STATE3 = 3, > > > > > > > > +}; > > > > > > > > + > > > > > > > > /* > > > > > > > > * List of attrs to commit later. > > > > > > > > */ > > > > > > > > @@ -88,7 +95,16 @@ struct xfs_attr_item { > > > > > > > > void *xattri_name; /* attr name */ > > > > > > > > uint32_t xattri_name_len; /* length of name */ > > > > > > > > uint32_t xattri_flags; /* attr flags */ > > > > > > > > - struct list_head xattri_list; > > > > > > > > + > > > > > > > > + /* > > > > > > > > + * Delayed attr parameters that need to remain instantiated > > > > > > > > + * across transaction rolls during the defer finish > > > > > > > > + */ > > > > > > > > + struct xfs_buf *xattri_leaf_bp; /* Leaf buf to release */ > > > > > > > > + enum xfs_attr_state xattri_state; /* state machine marker */ > > > > > > > > + struct xfs_da_args xattri_args; /* args context */ > > > > > > > > + > > > > > > > > + struct list_head xattri_list; > > > > > > > > /* > > > > > > > > * A byte array follows the header containing the file name and > > > > > > > > diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c > > > > > > > > index 0c54ff5..270c32e 100644 > > > > > > > > --- a/fs/xfs/scrub/common.c > > > > > > > > +++ b/fs/xfs/scrub/common.c > > > > > > > > @@ -30,6 +30,8 @@ > > > > > > > > #include "xfs_rmap_btree.h" > > > > > > > > #include "xfs_log.h" > > > > > > > > #include "xfs_trans_priv.h" > > > > > > > > +#include "xfs_da_format.h" > > > > > > > > +#include "xfs_da_btree.h" > > > > > > > > #include "xfs_attr.h" > > > > > > > > #include "xfs_reflink.h" > > > > > > > > #include "scrub/xfs_scrub.h" > > > > > > > > diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c > > > > > > > > index 142de8d..9b1b93e 100644 > > > > > > > > --- a/fs/xfs/xfs_acl.c > > > > > > > > +++ b/fs/xfs/xfs_acl.c > > > > > > > > @@ -10,6 +10,8 @@ > > > > > > > > #include "xfs_mount.h" > > > > > > > > #include "xfs_inode.h" > > > > > > > > #include "xfs_acl.h" > > > > > > > > +#include "xfs_da_format.h" > > > > > > > > +#include "xfs_da_btree.h" > > > > > > > > #include "xfs_attr.h" > > > > > > > > #include "xfs_trace.h" > > > > > > > > #include <linux/slab.h> > > > > > > > > diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c > > > > > > > > index 0ea19b4..36e6d1e 100644 > > > > > > > > --- a/fs/xfs/xfs_attr_item.c > > > > > > > > +++ b/fs/xfs/xfs_attr_item.c > > > > > > > > @@ -19,10 +19,10 @@ > > > > > > > > #include "xfs_rmap.h" > > > > > > > > #include "xfs_inode.h" > > > > > > > > #include "xfs_icache.h" > > > > > > > > -#include "xfs_attr.h" > > > > > > > > #include "xfs_shared.h" > > > > > > > > #include "xfs_da_format.h" > > > > > > > > #include "xfs_da_btree.h" > > > > > > > > +#include "xfs_attr.h" > > > > > > > > static inline struct xfs_attri_log_item *ATTRI_ITEM(struct xfs_log_item *lip) > > > > > > > > { > > > > > > > > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c > > > > > > > > index ab341d6..c8728ca 100644 > > > > > > > > --- a/fs/xfs/xfs_ioctl.c > > > > > > > > +++ b/fs/xfs/xfs_ioctl.c > > > > > > > > @@ -16,6 +16,8 @@ > > > > > > > > #include "xfs_rtalloc.h" > > > > > > > > #include "xfs_itable.h" > > > > > > > > #include "xfs_error.h" > > > > > > > > +#include "xfs_da_format.h" > > > > > > > > +#include "xfs_da_btree.h" > > > > > > > > #include "xfs_attr.h" > > > > > > > > #include "xfs_bmap.h" > > > > > > > > #include "xfs_bmap_util.h" > > > > > > > > diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c > > > > > > > > index 5001dca..23f6990 100644 > > > > > > > > --- a/fs/xfs/xfs_ioctl32.c > > > > > > > > +++ b/fs/xfs/xfs_ioctl32.c > > > > > > > > @@ -21,6 +21,8 @@ > > > > > > > > #include "xfs_fsops.h" > > > > > > > > #include "xfs_alloc.h" > > > > > > > > #include "xfs_rtalloc.h" > > > > > > > > +#include "xfs_da_format.h" > > > > > > > > +#include "xfs_da_btree.h" > > > > > > > > #include "xfs_attr.h" > > > > > > > > #include "xfs_ioctl.h" > > > > > > > > #include "xfs_ioctl32.h" > > > > > > > > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c > > > > > > > > index e73c21a..561c467 100644 > > > > > > > > --- a/fs/xfs/xfs_iops.c > > > > > > > > +++ b/fs/xfs/xfs_iops.c > > > > > > > > @@ -17,6 +17,7 @@ > > > > > > > > #include "xfs_acl.h" > > > > > > > > #include "xfs_quota.h" > > > > > > > > #include "xfs_error.h" > > > > > > > > +#include "xfs_da_btree.h" > > > > > > > > #include "xfs_attr.h" > > > > > > > > #include "xfs_trans.h" > > > > > > > > #include "xfs_trace.h" > > > > > > > > diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c > > > > > > > > index 3013746..938e81d 100644 > > > > > > > > --- a/fs/xfs/xfs_xattr.c > > > > > > > > +++ b/fs/xfs/xfs_xattr.c > > > > > > > > @@ -11,6 +11,7 @@ > > > > > > > > #include "xfs_mount.h" > > > > > > > > #include "xfs_da_format.h" > > > > > > > > #include "xfs_inode.h" > > > > > > > > +#include "xfs_da_btree.h" > > > > > > > > #include "xfs_attr.h" > > > > > > > > #include "xfs_attr_leaf.h" > > > > > > > > #include "xfs_acl.h" > > > > > > > > -- > > > > > > > > 2.7.4 > > > > > > > >