On Thu, Aug 18, 2022 at 06:05:54PM -0700, Alli wrote: > On Tue, 2022-08-16 at 13:41 -0700, Alli wrote: > > On Mon, 2022-08-15 at 22:07 -0700, Darrick J. Wong wrote: > > > On Tue, Aug 16, 2022 at 10:54:38AM +1000, Dave Chinner wrote: > > > > On Thu, Aug 11, 2022 at 06:55:16PM -0700, Alli wrote: > > > > > On Wed, 2022-08-10 at 16:12 +1000, Dave Chinner wrote: > > > > > > On Tue, Aug 09, 2022 at 10:01:49PM -0700, Alli wrote: > > > > > > > On Wed, 2022-08-10 at 11:58 +1000, Dave Chinner wrote: > > > > > > > > On Tue, Aug 09, 2022 at 09:52:55AM -0700, Darrick J. Wong > > > > > > > > wrote: > > > > > > > > > On Thu, Aug 04, 2022 at 12:39:56PM -0700, Allison > > > > > > > > > Henderson > > > > > > > > > wrote: > > > > > > > > > > Recent parent pointer testing has exposed a bug in > > > > > > > > > > the > > > > > > > > > > underlying > > > > > > > > > > attr replay. A multi transaction replay currently > > > > > > > > > > performs a > > > > > > > > > > single step of the replay, then deferrs the rest if > > > > > > > > > > there is > > > > > > > > > > more > > > > > > > > > > to do. > > > > > > > > > > > > > > > > Yup. > > > > > > > > > > > > > > > > > > This causes race conditions with other attr replays > > > > > > > > > > that > > > > > > > > > > might be recovered before the remaining deferred work > > > > > > > > > > has had > > > > > > > > > > a > > > > > > > > > > chance to finish. > > > > > > > > > > > > > > > > What other attr replays are we racing against? There can > > > > > > > > only be > > > > > > > > one incomplete attr item intent/done chain per inode > > > > > > > > present in > > > > > > > > log > > > > > > > > recovery, right? > > > > > > > No, a rename queues up a set and remove before committing > > > > > > > the > > > > > > > transaction. One for the new parent pointer, and another > > > > > > > to > > > > > > > remove > > > > > > > the > > > > > > > old one. > > > > > > > > > > > > Ah. That really needs to be described in the commit message - > > > > > > changing from "single intent chain per object" to "multiple > > > > > > concurrent independent and unserialised intent chains per > > > > > > object" is > > > > > > a pretty important design rule change... > > > > > > > > > > > > The whole point of intents is to allow complex, multi-stage > > > > > > operations on a single object to be sequenced in a tightly > > > > > > controlled manner. They weren't intended to be run as > > > > > > concurrent > > > > > > lines of modification on single items; if you need to do two > > > > > > modifications on an object, the intent chain ties the two > > > > > > modifications together into a single whole. > > > > > > > > > > > > One of the reasons I rewrote the attr state machine for LARP > > > > > > was to > > > > > > enable new multiple attr operation chains to be easily build > > > > > > from > > > > > > the entry points the state machien provides. Parent attr > > > > > > rename > > > > > > needs a new intent chain to be built, not run multiple > > > > > > independent > > > > > > intent chains for each modification. > > > > > > > > > > > > > It cant be an attr replace because technically the names > > > > > > > are > > > > > > > different. > > > > > > > > > > > > I disagree - we have all the pieces we need in the state > > > > > > machine > > > > > > already, we just need to define separate attr names for the > > > > > > remove and insert steps in the attr intent. > > > > > > > > > > > > That is, the "replace" operation we execute when an attr set > > > > > > overwrites the value is "technically" a "replace value" > > > > > > operation, > > > > > > but we actually implement it as a "replace entire attribute" > > > > > > operation. > > > > > > > > > > > > Without LARP, we do that overwrite in independent steps via > > > > > > an > > > > > > intermediate INCOMPLETE state to allow two xattrs of the same > > > > > > name > > > > > > to exist in the attr tree at the same time. IOWs, the attr > > > > > > value > > > > > > overwrite is effectively a "set-swap-remove" operation on two > > > > > > entirely independent xattrs, ensuring that if we crash we > > > > > > always > > > > > > have either the old or new xattr visible. > > > > > > > > > > > > With LARP, we can remove the original attr first, thereby > > > > > > avoiding > > > > > > the need for two versions of the xattr to exist in the tree > > > > > > in > > > > > > the > > > > > > first place. However, we have to do these two operations as a > > > > > > pair > > > > > > of linked independent operations. The intent chain provides > > > > > > the > > > > > > linking, and requires us to log the name and the value of the > > > > > > attr > > > > > > that we are overwriting in the intent. Hence we can always > > > > > > recover > > > > > > the modification to completion no matter where in the > > > > > > operation > > > > > > we > > > > > > fail. > > > > > > > > > > > > When it comes to a parent attr rename operation, we are > > > > > > effectively > > > > > > doing two linked operations - remove the old attr, set the > > > > > > new > > > > > > attr > > > > > > - on different attributes. Implementation wise, it is exactly > > > > > > the > > > > > > same sequence as a "replace value" operation, except for the > > > > > > fact > > > > > > that the new attr we add has a different name. > > > > > > > > > > > > Hence the only real difference between the existing "attr > > > > > > replace" > > > > > > and the intent chain we need for "parent attr rename" is that > > > > > > we > > > > > > have to log two attr names instead of one. > > > > > > > > > > To be clear, this would imply expanding xfs_attri_log_format to > > > > > have > > > > > another alfi_new_name_len feild and another iovec for the attr > > > > > intent > > > > > right? Does that cause issues to change the on disk log layout > > > > > after > > > > > the original has merged? Or is that ok for things that are > > > > > still > > > > > experimental? Thanks! > > > > > > > > I think we can get away with this quite easily without breaking > > > > the > > > > existing experimental code. > > > > > > > > struct xfs_attri_log_format { > > > > uint16_t alfi_type; /* attri log item type */ > > > > uint16_t alfi_size; /* size of this item */ > > > > uint32_t __pad; /* pad to 64 bit aligned > > > > */ > > > > uint64_t alfi_id; /* attri identifier */ > > > > uint64_t alfi_ino; /* the inode for this > > > > attr > > > > operation */ > > > > uint32_t alfi_op_flags; /* marks the op as a set > > > > or > > > > remove */ > > > > uint32_t alfi_name_len; /* attr name length */ > > > > uint32_t alfi_value_len; /* attr value length */ > > > > uint32_t alfi_attr_filter;/* attr filter flags */ > > > > }; > > > > > > > > We have a padding field in there that is currently all zeros. > > > > Let's > > > > make that a count of the number of {name, value} tuples that are > > > > appended to the format. i.e. > > > > > > > > struct xfs_attri_log_name { > > > > uint32_t alfi_op_flags; /* marks the op as a set > > > > or > > > > remove */ > > > > uint32_t alfi_name_len; /* attr name length */ > > > > uint32_t alfi_value_len; /* attr value length */ > > > > uint32_t alfi_attr_filter;/* attr filter flags */ > > > > }; > > > > > > > > struct xfs_attri_log_format { > > > > uint16_t alfi_type; /* attri log item type */ > > > > uint16_t alfi_size; /* size of this item */ > > > > uint8_t alfi_attr_cnt; /* count of name/val > > > > pairs > > > > */ > > > > uint8_t __pad1; /* pad to 64 bit > > > > aligned */ > > > > uint16_t __pad2; /* pad to 64 bit aligned */ > > > > uint64_t alfi_id; /* attri identifier */ > > > > uint64_t alfi_ino; /* the inode for this > > > > attr > > > > operation */ > > > > struct xfs_attri_log_name alfi_attr[]; /* attrs to operate on > > > > */ > > > > }; > > > > > > > > Basically, the size and shape of the structure has not changed, > > > > and > > > > if alfi_attr_cnt == 0 we just treat it as if alfi_attr_cnt == 1 > > > > as > > > > the backwards compat code for the existing code. > > > > > > > > And then we just have as many followup regions for name/val pairs > > > > as are defined by the alfi_attr_cnt and alfi_attr[] parts of the > > > > structure. Each attr can have a different operation performed on > > > > them, and they can have different filters applied so they can > > > > exist > > > > in different namespaces, too. > > > > > > > > SO I don't think we need a new on-disk feature bit for this > > > > enhancement - it definitely comes under the heading of "this > > > > stuff > > > > is experimental, this is the sort of early structure revision > > > > that > > > > EXPERIMENTAL is supposed to cover.... > > > > > > You might even callit "alfi_extra_names" to avoid the "0 means 1" > > > stuff. > > > ;) > > > > > > --D > > > > Oh, I just noticed these comments this morning when I sent out the > > new > > attri/d patch. I'll add this changes to v2. Please let me know if > > there's anything else you'd like me to change from the v1. Thx! > > > > Allison > > Ok, so I am part way through coding this up, and I'm getting this > feeling like this is not going to work out very well due to the size > checks for the log formats: > > root@garnet:/home/achender/work_area/xfs-linux# git diff > fs/xfs/libxfs/xfs_log_format.h fs/xfs/xfs_ondisk.h > diff --git a/fs/xfs/libxfs/xfs_log_format.h > b/fs/xfs/libxfs/xfs_log_format.h > index f1ff52ebb982..5a4e700f32fc 100644 > --- a/fs/xfs/libxfs/xfs_log_format.h > +++ b/fs/xfs/libxfs/xfs_log_format.h > @@ -922,6 +922,13 @@ struct xfs_icreate_log { > XFS_ATTR_PARENT | \ > XFS_ATTR_INCOMPLETE) > > +struct xfs_attri_log_name { > + uint32_t alfi_op_flags; /* marks the op as a set or > remove */ > + uint32_t alfi_name_len; /* attr name length */ > + uint32_t alfi_value_len; /* attr value length */ > + uint32_t alfi_attr_filter;/* attr filter flags */ > +}; > + > /* > * This is the structure used to lay out an attr log item in the > * log. > @@ -929,14 +936,12 @@ struct xfs_icreate_log { > struct xfs_attri_log_format { > uint16_t alfi_type; /* attri log item type */ > uint16_t alfi_size; /* size of this item */ > - uint32_t __pad; /* pad to 64 bit aligned */ > + uint8_t alfi_extra_names;/* count of name/val pairs */ > + uint8_t __pad1; /* pad to 64 bit aligned */ > + uint16_t __pad2; /* pad to 64 bit aligned */ > uint64_t alfi_id; /* attri identifier */ > uint64_t alfi_ino; /* the inode for this attr > operation */ > - uint32_t alfi_op_flags; /* marks the op as a set or > remove */ > - uint32_t alfi_name_len; /* attr name length */ > - uint32_t alfi_value_len; /* attr value length */ > - uint32_t alfi_attr_filter;/* attr filter flags */ > + struct xfs_attri_log_name alfi_attr[]; /* attrs to operate on What's the length of this VLA? 1 for a normal SET or REPLACE operation, and 2 for the "rename and replace value" operation? If so, why do we need two xfs_attri_log_name structures? The old value is unimportant, so we only need one alfi_value_len per operation. Each xfs_attri_log_format only describes one change, so it only needs one alfi_op_flags per op. For now I also don't think attributes should be able to jump namespaces, so we'd only need one alfi_attr_filter per op as well. *lightbulb comes on* Oops, I think I led you astray with my unfortunate comment. :( IOWs, the only change to struct xfs_attri_log_format is: - uint32_t __pad; /* pad to 64 bit aligned */ + uint32_t alfi_new_namelen;/* new attr name length */ and the rest of the changes in "[PATCH] xfs: Add new name to attri/d" are more or less fine as is. I'll go reply to that before I get back to Dave's log accounting stuff. --D > */ > }; > > struct xfs_attrd_log_format { > diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h > index 3e7f7eaa5b96..c040eeb88def 100644 > --- a/fs/xfs/xfs_ondisk.h > +++ b/fs/xfs/xfs_ondisk.h > @@ -132,7 +132,7 @@ xfs_check_ondisk_structs(void) > XFS_CHECK_STRUCT_SIZE(struct xfs_inode_log_format, 56); > XFS_CHECK_STRUCT_SIZE(struct xfs_qoff_logformat, 20); > XFS_CHECK_STRUCT_SIZE(struct xfs_trans_header, 16); > - XFS_CHECK_STRUCT_SIZE(struct xfs_attri_log_format, 48); > + XFS_CHECK_STRUCT_SIZE(struct xfs_attri_log_format, 24); > XFS_CHECK_STRUCT_SIZE(struct xfs_attrd_log_format, 16); > > /* parent pointer ioctls */ > root@garnet:/home/achender/work_area/xfs-linux# > > > > If the on disk size check thinks the format is 24 bytes, and then we > surprise pack an array of structs after it, isnt that going to run over > the next item? I think anything dynamic like this has to be an nvec. > Maybe we leave the existing alfi_* as they are so the size doesnt > change, and then if we have a value in alfi_extra_names, then we have > an extra nvec that has the array in it. I think that would work. > > FWIW, an alternate solution would be to use the pad for a second name > length, and then we get a patch that's very similar to the one I sent > out last Tues, but backward compatible. Though it does eat the > remaining pad and wouldn't be as flexible, I cant think of an attr op > that would need more than two names either? > > Let me know what people think. Thanks! > Allison > > > > > > Cheers, > > > > > > > > Dave. > > > > -- > > > > Dave Chinner > > > > david@xxxxxxxxxxxxx >