On Tue, Aug 15, 2017 at 06:44:30PM +1000, Dave Chinner wrote: > On Tue, Aug 15, 2017 at 03:18:58AM -0300, Ernesto A. Fernández wrote: > > Hi Dave, thanks for the review! > > > > On Tue, Aug 15, 2017 at 12:00:18PM +1000, Dave Chinner wrote: > > > On Mon, Aug 14, 2017 at 05:18:10AM -0300, Ernesto A. Fernández wrote: > > > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c > > > > index de7b9bd..c3193e1 100644 > > > > --- a/fs/xfs/libxfs/xfs_attr.c > > > > +++ b/fs/xfs/libxfs/xfs_attr.c > > > > @@ -395,6 +395,7 @@ xfs_attr_remove( > > > > struct xfs_defer_ops dfops; > > > > xfs_fsblock_t firstblock; > > > > int error; > > > > + int noattr = 0; > > > > > > bool. > > > > > > And make it "bool has_attr = true" to avoid the nasty double > > > negative of "!noattr" later on. > > > > I called it noattr because the error was -ENOATTR, but you are right, > > has_attr is better. I assume you would prefer mode_changed to be a bool > > as well? > > > > > > > > > > > > > XFS_STATS_INC(mp, xs_attr_remove); > > > > > > > > @@ -438,7 +439,12 @@ xfs_attr_remove( > > > > xfs_trans_ijoin(args.trans, dp, 0); > > > > > > > > if (!xfs_inode_hasattr(dp)) { > > > > - error = -ENOATTR; > > > > + /* > > > > + * Commit even if there is no attr to remove, because when > > > > + * setting ACLs the caller may be changing the mode in the > > > > + * same transaction. > > > > + */ > > > > + noattr = 1; > > > > } else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) { > > > > ASSERT(dp->i_afp->if_flags & XFS_IFINLINE); > > > > error = xfs_attr_shortform_remove(&args); > > > > @@ -458,7 +464,7 @@ xfs_attr_remove( > > > > if (mp->m_flags & XFS_MOUNT_WSYNC) > > > > xfs_trans_set_sync(args.trans); > > > > > > > > - if ((flags & ATTR_KERNOTIME) == 0) > > > > + if ((flags & ATTR_KERNOTIME) == 0 && !noattr) > > > > xfs_trans_ichgtime(args.trans, dp, XFS_ICHGTIME_CHG); > > > > > > Why? > > > > Because we don't want the ctime to change when you try to delete an > > attribute that didn't exist in the first place. At least that's how > > it went before, so I tried to keep it the same. > > Which means the logic is convoluted and no longer obvious as to what > it does. i.e. the noattr case is running a transaction that dirties > an inode that no modifications were made to. Alarm bells right > there. > > > > > @@ -257,16 +242,20 @@ xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type) > > > > return error; > > > > > > > > if (type == ACL_TYPE_ACCESS) { > > > > - umode_t mode; > > > > - > > > > - error = posix_acl_update_mode(inode, &mode, &acl); > > > > - if (error) > > > > - return error; > > > > - error = xfs_set_mode(inode, mode); > > > > + error = posix_acl_update_mode(inode, &inode->i_mode, &acl); > > > > if (error) > > > > return error; > > > > + inode->i_ctime = current_time(inode); > > > > + XFS_STATS_INC(XFS_I(inode)->i_mount, xs_ig_attrchg); > > > > + mode_changed = 1; > > > > } > > > > > > And this doesn't. > > > > > > So this breaks all our rules for inode metadata updates, both for > > > runtime access serialisation and for crash consistency. > > > > I'm only making changes to the vfs inode, and xfs_set_acl() should always > > be called with i_rwsem held for writing. I think that should be enough to > > serialize access to the vfs inode fields. > > It's not. The i_rwsem does not protect XFS inode metadata, which is > what needs changing here. i.e. we change both the VFS inode and > the corresponding XFS inode metadata at the same time, under the > same XFS_ILOCK_EXCL lock, only inside a transaction context. > > That's the rules, no exceptions. > > > I'm sorry, I'm not sure I follow the crash consistency part. This is not > > writing anything to disk. > > xfs_set_mode() runs a transaction that is journalled to change the > inode mode. You break in-memory consistency by modifying the VFS > inode state directly without holding the correct locks (see > xfs_setattr_mode()), and you break the transactional change model > (i.e. on-disk journal consistency and on-disk metadata change > ordering) by making metadata changes this outside transaction > contexts and without holding the correct locks. > > > My expectation was that the changes I made to > > the vfs inode would be attached to the transaction later, when calling > > xfs_trans_log_inode() in xfs_attr_set() or xfs_attr_remove(). Would that > > not work as intended? > > No, it wouldn't. You have to follow the filesystem specific update > rules to change inode metadata, not make up your own.... > > > I was mostly imitating how btrfs sets an ACL, so I > > could be misunderstanding how xfs does journaling. > > ... and btrfs has a completely different transaction model and > implementation to XFS. IOWs, trying to do what works for btrfs in > XFS code will break XFS. And vice versa. > > > > > set_acl: > > > > - return __xfs_set_acl(inode, acl, type); > > > > + error = __xfs_set_acl(inode, acl, type); > > > > + if (error && mode_changed) { > > > > + inode->i_mode = old_mode; > > > > + inode->i_ctime = old_ctime; > > > > + XFS_STATS_DEC(XFS_I(inode)->i_mount, xs_ig_attrchg); > > > > + } > > > > > > Same here. > > > > i_rwsem would still be held here. My reasoning is that, since there was > > an error, no changes were actually committed; so there would be no need > > to commit the restoration either, just restore the old in-memory vfs inode > > fields. Once again, this is inspired by btrfs, so I suppose it may fail > > here in some way I can't see. > > > > (I should probably clarify I did run tests before submitting) > > When you treat the code as a black box for testing, then end result > might appear to be what you want. That doesn't mean the > implementation is correct or that hidden transient states within the > block box are incorrect or invalid. > > > > IOWs, you can't just remove a transactional modification of inode > > > state and replace it with in-memory VFS inode level changes. You > > > need to do the changes in transaction context and with the correct > > > locks held. > > > > > > IOWs, if you need to undo the mode changes xfs_set_mode() did, you > > > need to call xfs_set_mode() again. And because modifications have > > > > But that could fail as well, and how would we handle that? > > If the xattr change was not a fatal error (which will shut down the > filesystem), then calling xfs_set_mode() should work just fine as it > doesn't require any new space to be allocated (XFS != BTRFS - we can > run overwriting transactions even at ENOSPC). > > And you need to run the mode undo as a separate transaction > because.... > > > If this bug > > is to be fixed the whole thing needs to be a single transaction. > > .... this is simply not possible. Why? Because __xfs_set_acl() can > run up three separate transactions internally in adding the new > xattr. That means your unlogged vfs indoe change has already been > logged to the journal and so we *require* a transaction to overwrite > the new mode that has already been committed to the journal. It's clear that I don't understand xfs, and I'm causing more trouble than I'm helping. I should leave this issue to you guys. > But I have to ask - why do we even need to modify the mode first? > Why not change the ACL first, then modify the mode+timestamp? If > setting the ACL fails, then we don't have anything to undo and all > is good.... I intended for the mode to be committed as part of the same transaction that sets or removes the ACL. In my mind making the changes later, as part of a separate transaction, would have meant that a crash between the two left the filesystem in an inconsistent state, with a new ACL but without the corresponding mode bits. Of course that was based on a poor understanding of xfs journaling, so now I don't know. > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx Thanks, Ernest -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html