Hi Dave, thanks for the review! On Tue, Aug 15, 2017 at 12:00:18PM +1000, Dave Chinner wrote: > On Mon, Aug 14, 2017 at 05:18:10AM -0300, Ernesto A. Fernández wrote: > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c > > index de7b9bd..c3193e1 100644 > > --- a/fs/xfs/libxfs/xfs_attr.c > > +++ b/fs/xfs/libxfs/xfs_attr.c > > @@ -395,6 +395,7 @@ xfs_attr_remove( > > struct xfs_defer_ops dfops; > > xfs_fsblock_t firstblock; > > int error; > > + int noattr = 0; > > bool. > > And make it "bool has_attr = true" to avoid the nasty double > negative of "!noattr" later on. I called it noattr because the error was -ENOATTR, but you are right, has_attr is better. I assume you would prefer mode_changed to be a bool as well? > > > > > XFS_STATS_INC(mp, xs_attr_remove); > > > > @@ -438,7 +439,12 @@ xfs_attr_remove( > > xfs_trans_ijoin(args.trans, dp, 0); > > > > if (!xfs_inode_hasattr(dp)) { > > - error = -ENOATTR; > > + /* > > + * Commit even if there is no attr to remove, because when > > + * setting ACLs the caller may be changing the mode in the > > + * same transaction. > > + */ > > + noattr = 1; > > } else if (dp->i_d.di_aformat == XFS_DINODE_FMT_LOCAL) { > > ASSERT(dp->i_afp->if_flags & XFS_IFINLINE); > > error = xfs_attr_shortform_remove(&args); > > @@ -458,7 +464,7 @@ xfs_attr_remove( > > if (mp->m_flags & XFS_MOUNT_WSYNC) > > xfs_trans_set_sync(args.trans); > > > > - if ((flags & ATTR_KERNOTIME) == 0) > > + if ((flags & ATTR_KERNOTIME) == 0 && !noattr) > > xfs_trans_ichgtime(args.trans, dp, XFS_ICHGTIME_CHG); > > Why? Because we don't want the ctime to change when you try to delete an attribute that didn't exist in the first place. At least that's how it went before, so I tried to keep it the same. > > > > > /* > > @@ -468,6 +474,9 @@ xfs_attr_remove( > > error = xfs_trans_commit(args.trans); > > xfs_iunlock(dp, XFS_ILOCK_EXCL); > > > > + if (!error && noattr) > > + error = -ENOATTR; > > + > > return error; > > > > out: > > diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c > > index 7034e17..1fe4363 100644 > > --- a/fs/xfs/xfs_acl.c > > +++ b/fs/xfs/xfs_acl.c > > @@ -226,28 +226,13 @@ __xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type) > > return error; > > } > > > > -static int > > -xfs_set_mode(struct inode *inode, umode_t mode) > > -{ > > - int error = 0; > > - > > - if (mode != inode->i_mode) { > > - struct iattr iattr; > > - > > - iattr.ia_valid = ATTR_MODE | ATTR_CTIME; > > - iattr.ia_mode = mode; > > - iattr.ia_ctime = current_time(inode); > > - > > - error = xfs_setattr_nonsize(XFS_I(inode), &iattr, XFS_ATTR_NOACL); > > - } > > - > > - return error; > > -} > > Ok, so that does an atomic, transactional change to the inode > mode.... > > > int > > xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type) > > { > > int error = 0; > > + int mode_changed = 0; > > + umode_t old_mode = inode->i_mode; > > + struct timespec old_ctime = inode->i_ctime; > > > > if (!acl) > > goto set_acl; > > @@ -257,16 +242,20 @@ xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type) > > return error; > > > > if (type == ACL_TYPE_ACCESS) { > > - umode_t mode; > > - > > - error = posix_acl_update_mode(inode, &mode, &acl); > > - if (error) > > - return error; > > - error = xfs_set_mode(inode, mode); > > + error = posix_acl_update_mode(inode, &inode->i_mode, &acl); > > if (error) > > return error; > > + inode->i_ctime = current_time(inode); > > + XFS_STATS_INC(XFS_I(inode)->i_mount, xs_ig_attrchg); > > + mode_changed = 1; > > } > > And this doesn't. > > So this breaks all our rules for inode metadata updates, both for > runtime access serialisation and for crash consistency. > I'm only making changes to the vfs inode, and xfs_set_acl() should always be called with i_rwsem held for writing. I think that should be enough to serialize access to the vfs inode fields. I'm sorry, I'm not sure I follow the crash consistency part. This is not writing anything to disk. My expectation was that the changes I made to the vfs inode would be attached to the transaction later, when calling xfs_trans_log_inode() in xfs_attr_set() or xfs_attr_remove(). Would that not work as intended? I was mostly imitating how btrfs sets an ACL, so I could be misunderstanding how xfs does journaling. > > set_acl: > > - return __xfs_set_acl(inode, acl, type); > > + error = __xfs_set_acl(inode, acl, type); > > + if (error && mode_changed) { > > + inode->i_mode = old_mode; > > + inode->i_ctime = old_ctime; > > + XFS_STATS_DEC(XFS_I(inode)->i_mount, xs_ig_attrchg); > > + } > > Same here. i_rwsem would still be held here. My reasoning is that, since there was an error, no changes were actually committed; so there would be no need to commit the restoration either, just restore the old in-memory vfs inode fields. Once again, this is inspired by btrfs, so I suppose it may fail here in some way I can't see. (I should probably clarify I did run tests before submitting) > IOWs, you can't just remove a transactional modification of inode > state and replace it with in-memory VFS inode level changes. You > need to do the changes in transaction context and with the correct > locks held. > > IOWs, if you need to undo the mode changes xfs_set_mode() did, you > need to call xfs_set_mode() again. And because modifications have But that could fail as well, and how would we handle that? If this bug is to be fixed the whole thing needs to be a single transaction. That's why I removed the xfs_set_mode() function in the first place. > been made, the new ctime should remain in place, too. > > Of course, this still isn't atomic from a runtime access POV, but > that's substantially harder to provide than just undoing the mode > change on ACL attribute save error. > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx Thanks, Ernest -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html