On Mon, Jul 17, 2017 at 10:48:16AM -0400, Brian Foster wrote: > On Fri, Jul 14, 2017 at 03:46:58PM -0700, Darrick J. Wong wrote: > > On Fri, Jul 14, 2017 at 03:07:41PM -0400, Brian Foster wrote: > > From what I can tell, he seems to have taken Mark's patches to pass dir > > offsets back from the dirent manipulation functions and then started > > writing a simple implementation of stuffing the attrs into the attr > > tree. Yup, that's the idea. > > > FWIW, I think there was some mention of porting some of the operations > > > over to the deferred ops infrastructure, but it's not clear to me off > > > the top of my head how important (or appropriate) that is. > > > > Most of the users of xfs_trans_roll seem to be buried in the xattr code. > > For example, _attr_set can end up rolling a transaction after converting > > an attr fork from short format to leaf format before retrying the attr > > add operation, but we don't use log redo items (er, defer_ops items) to > > make sure log recovery will restart the attr add operation if we crash > > before the final _trans_commit after calling _attr_{leaf,node}_addname. > > > > In the case of a regular xattr set operation this wouldn't have been a > > big deal because the fsetxattr call wouldn't have returned, so all the > > user could possibly see is an inode with a perhaps unnecessarily large > > xattr fork. Now that we need to set xattrs in the same transaction > > chain as a directory operation, it becomes very important that log > > recovery gain the ability to resume an xattr add operation no matter > > where the log stopped. Creating a directory can become this longwinded > > pile of updates: > > > > Ok, thanks for the summary. So we have some transaction ordering down in > the xattr code that might no longer be sufficient to provide atomicity > in the context of parent pointers (via directory operations). We already have xattr operations that are not atomic with directory entry creation - security xattrs (default ACL, etc). This can mean that after a crash and recovery, we have a newly created file without any security xattr on it. This is a long standing problem - it bites SGI's HSM (DMF) all the time, so rather than fixing the problem 10+ years ago, they added a post-boot inode scan (via bulkstat) to find inodes that didn't have a DMF xattr. The original parent pointer patchset also had this problem the moment xattrs were required (i.e. first hardlink) - the creation of the pp xattr was not atomic with the dirent creation. To solve this problem, I started by moving the xattr creation into the transaction context of the dirent. > I see that > the current patch series increases the transaction size to accommodate > parent pointers, but that doesn't help us deal with those situations > where the transaction is rolled into a new one. Right, I hadn't got that far. The plan effective was to get a proof of concept running with the xattr creation within the directory creation xact context and get that working, knowing that it still wasn't properly atomic. This was done before we had deferred ops, and my thinking at the time was to add an intent/done pair of ops to tell recovery it needed to add an xattr to the dirent once the xattr was inside the dirent context. i.e. - move xattr inside dirent ctx - add pp intent - roll transaction - add pp xattr - roll final xattr transaction rather than commit - add pp done - commit dir ctx transaction. This doesn't help us with security context xattrs, though, so when deferred ops came along, it made much more sense to me to make the adding of an xattr a deferred op so we could use the same mechanism for adding all the required xattrs to the inode atomically and so have a mechanism for recovery to replay them appropriately. > > - Allocate directory block, map into data fork, chain rmap update, add > > directory entry, chain pptr update; > > - Add rmap entry for new directory block (more chaining to fix > > freelist), finish first rmap update; > > - Allocate xattr block for short->leaf conversion, map into attr fork, > > chain rmap update; chain xattr add operation; > > - Add rmap entry for new xattr block (more chaining to fix freelist), > > finish second rmap update; > > - Add xattr entry to file, finish xattr add operation; > > - Finish pptr update. > > > > Basically, I think someone's going to have to go audit all the uses of > > xfs_trans_roll in the attr code to figure out which operations need redo > > items, and how to cram everything toegether into a single xfs_defer_ops, > > rather than sprinkling them around the attr code like we do now, because > > redo items cannot be deferred from one defer_ops to another. Probably should handle that when we dup the transaction just before the roll, right? Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html