On Tue, Jul 18, 2017 at 08:14:07AM +1000, Dave Chinner wrote: > On Mon, Jul 17, 2017 at 10:48:16AM -0400, Brian Foster wrote: > > On Fri, Jul 14, 2017 at 03:46:58PM -0700, Darrick J. Wong wrote: > > > On Fri, Jul 14, 2017 at 03:07:41PM -0400, Brian Foster wrote: > > > From what I can tell, he seems to have taken Mark's patches to pass dir > > > offsets back from the dirent manipulation functions and then started > > > writing a simple implementation of stuffing the attrs into the attr > > > tree. > > Yup, that's the idea. > > > > > FWIW, I think there was some mention of porting some of the operations > > > > over to the deferred ops infrastructure, but it's not clear to me off > > > > the top of my head how important (or appropriate) that is. > > > > > > Most of the users of xfs_trans_roll seem to be buried in the xattr code. > > > For example, _attr_set can end up rolling a transaction after converting > > > an attr fork from short format to leaf format before retrying the attr > > > add operation, but we don't use log redo items (er, defer_ops items) to > > > make sure log recovery will restart the attr add operation if we crash > > > before the final _trans_commit after calling _attr_{leaf,node}_addname. > > > > > > In the case of a regular xattr set operation this wouldn't have been a > > > big deal because the fsetxattr call wouldn't have returned, so all the > > > user could possibly see is an inode with a perhaps unnecessarily large > > > xattr fork. Now that we need to set xattrs in the same transaction > > > chain as a directory operation, it becomes very important that log > > > recovery gain the ability to resume an xattr add operation no matter > > > where the log stopped. Creating a directory can become this longwinded > > > pile of updates: > > > > > > > Ok, thanks for the summary. So we have some transaction ordering down in > > the xattr code that might no longer be sufficient to provide atomicity > > in the context of parent pointers (via directory operations). > > We already have xattr operations that are not atomic with directory > entry creation - security xattrs (default ACL, etc). This can mean > that after a crash and recovery, we have a newly created file > without any security xattr on it. > > This is a long standing problem - it bites SGI's HSM (DMF) all the > time, so rather than fixing the problem 10+ years ago, they added a > post-boot inode scan (via bulkstat) to find inodes that didn't have > a DMF xattr. <nod> That reminds me, Ted was asking what _check_xfs_test_fs (in xfstests) does. Judging from the nonexistent xfs_repair_ipaths command, I'm guessing that program fixed up pptr xattrs if they were broken/missing/whatever? > The original parent pointer patchset also had this problem the > moment xattrs were required (i.e. first hardlink) - the creation of > the pp xattr was not atomic with the dirent creation. To solve this > problem, I started by moving the xattr creation into the transaction > context of the dirent. > > > I see that > > the current patch series increases the transaction size to accommodate > > parent pointers, but that doesn't help us deal with those situations > > where the transaction is rolled into a new one. > > Right, I hadn't got that far. The plan effective was to get a proof > of concept running with the xattr creation within the directory > creation xact context and get that working, knowing that it still > wasn't properly atomic. This was done before we had deferred ops, > and my thinking at the time was to add an intent/done pair of ops > to tell recovery it needed to add an xattr to the dirent once the > xattr was inside the dirent context. i.e. > > - move xattr inside dirent ctx > - add pp intent > - roll transaction > - add pp xattr > - roll final xattr transaction rather than commit > - add pp done > - commit dir ctx transaction. > > This doesn't help us with security context xattrs, though, so > when deferred ops came along, it made much more sense to me to > make the adding of an xattr a deferred op so we could use the same > mechanism for adding all the required xattrs to the inode atomically > and so have a mechanism for recovery to replay them appropriately. <nod> > > > - Allocate directory block, map into data fork, chain rmap update, add > > > directory entry, chain pptr update; > > > - Add rmap entry for new directory block (more chaining to fix > > > freelist), finish first rmap update; > > > - Allocate xattr block for short->leaf conversion, map into attr fork, > > > chain rmap update; chain xattr add operation; > > > - Add rmap entry for new xattr block (more chaining to fix freelist), > > > finish second rmap update; > > > - Add xattr entry to file, finish xattr add operation; > > > - Finish pptr update. > > > > > > Basically, I think someone's going to have to go audit all the uses of > > > xfs_trans_roll in the attr code to figure out which operations need redo > > > items, and how to cram everything toegether into a single xfs_defer_ops, > > > rather than sprinkling them around the attr code like we do now, because > > > redo items cannot be deferred from one defer_ops to another. > > Probably should handle that when we dup the transaction just before > the roll, right? Something like that. Ideally, I think I'd like to have two xattr redo items: "add key => value" and "remove key", where "add key" would first ensure xattr space, roll, and then actually add the attr & value. A big problem with that scheme is that we then have to log the value, which could be ... tough. --D > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html