On Tue, 2022-05-03 at 17:40 +1000, Dave Chinner wrote: > On Thu, Apr 28, 2022 at 12:02:17AM -0700, Alli wrote: > > On Thu, 2022-04-14 at 19:44 +1000, Dave Chinner wrote: > > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > > > We can't use the same algorithm for replacing an existing > > > attribute > > > when logging attributes. The existing algorithm is essentially: > > > > > > 1. create new attr w/ INCOMPLETE > > > 2. atomically flip INCOMPLETE flags between old + new attribute > > > 3. remove old attr which is marked w/ INCOMPLETE > > > > > > This algorithm guarantees that we see either the old or new > > > attribute, and if we fail after the atomic flag flip, we don't > > > have > > > to recover the removal of the old attr because we never see > > > INCOMPLETE attributes in lookups. > .... > > > diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c > > > index 39af681897a2..a46379a9e9df 100644 > > > --- a/fs/xfs/xfs_attr_item.c > > > +++ b/fs/xfs/xfs_attr_item.c > > > @@ -490,9 +490,14 @@ xfs_attri_validate( > > > if (attrp->__pad != 0) > > > return false; > > > > > > - /* alfi_op_flags should be either a set or remove */ > > > - if (op != XFS_ATTR_OP_FLAGS_SET && op != > > > XFS_ATTR_OP_FLAGS_REMOVE) > > > + switch (op) { > > > + case XFS_ATTR_OP_FLAGS_SET: > > > + case XFS_ATTR_OP_FLAGS_REMOVE: > > > + case XFS_ATTR_OP_FLAGS_REPLACE: > > > + break; > > > + default: > > > return false; > > > + } > > > > > > if (attrp->alfi_value_len > XATTR_SIZE_MAX) > > > return false; > > > @@ -553,11 +558,27 @@ xfs_attri_item_recover( > > > args->namelen = attrp->alfi_name_len; > > > args->hashval = xfs_da_hashname(args->name, args->namelen); > > > args->attr_filter = attrp->alfi_attr_flags; > > > + args->op_flags = XFS_DA_OP_RECOVERY; > > > > > > - if (attrp->alfi_op_flags == XFS_ATTR_OP_FLAGS_SET) { > > > + switch (attr->xattri_op_flags) { > > > + case XFS_ATTR_OP_FLAGS_SET: > > > + case XFS_ATTR_OP_FLAGS_REPLACE: > > > args->value = attrip->attri_value; > > > args->valuelen = attrp->alfi_value_len; > > > args->total = xfs_attr_calc_size(args, &local); > > > + if (xfs_inode_hasattr(args->dp)) > > I ran into a test failure and tracked it down to the above line. I > > suspect because xfs_inode_hasattr only checks to see if the inode > > has > > an attr fork, it doesnt actually check to see if it has the attr > > we're > > replacing. > > Right, that was intentional. It is based on the fact that if we > are recovering a set or a replace operation, we have to remove the > INCOMPLETE xattr first. However, if the attr fork is empty, there's > no INCOMPLETE xattr to remove, and so we can just go straight to the > set operation to create the new value. > > Hmmm - what was the failure? Was it a null pointer dereference > on ip->i_afp? I wonder if you hit the corner case where attr removal > can remove the attr fork, and that's when it crashed and we've tried > to recover from? No, the actual shutdown was from the error inject that the test case uses. The unexpected part was a set operation returning -ENODATA because we incorrectly fell down the rename path. I suspect the reason the parent pointers exposed it was because the presence of the parent pointer caused the attr fork to not be empty and so xfs_inode_hasattr succeeds. > > Oh, I think I might have missed a case there. If you look at > xfs_attr_sf_removename() I added a case to avoid removing the attr > fork when XFS_DA_OP_RENAME is set because we don't want to remove it > when we are about to add to it again. But I didn't add the same > logic to xfs_attr3_leaf_to_shortform() which can also trash the attr > fork if the last attr we remove from the attr fork is larger than > would fit in a sf attr fork. Hence we go straight from leaf form to > no attr fork at all. > > Ok, that's definitely a bug, I'll need to fix that, and it could be > the cause of this issue as removing the attr fork will set > forkoff to 0 and so the inode will not have an attr fork > instantiated when it is read into memory... > > Ah, that could be it then. The last failing test case is: expanding the fork into node form, setting the inject, and attempting a rename. The correct result should be that it finds the attr correctly renamed, but instead finds no attr. So that sounds like what you've described. I will wait for your fix and then retest. Thx for all your help here! Allison > > So we fall into the replace code path where it should have > > been the set code path. If I replace it with the below line, the > > failure is resolved: > > > > if (attr->xattri_op_flags == XFS_ATTR_OP_FLAGS_REPLACE) > > > > I only encountered this bug after running with parent pointers > > though > > which generates a lot more activity, but I figure it's not a bad > > idea > > to catch things early. There's one more test failure it's picking > > up, > > though I havnt figured out the cause just yet. > > Yup, that's a good idea. > > > The rest looks good though, I see the below lines address the issue > > of > > the states needing to be initialized in the replay paths, so that > > addresses the concerns in patches 4 and 13. Thanks for all the > > larp > > improvements! > > I'm going to try to move them up into patches 4 and 13, so that > recovery at least tries to function correctly as we move through the > patch set. > > Cheers, > > Dave.