On Tue, Aug 15, 2017 at 01:23:07PM +1000, Dave Chinner wrote: > On Mon, Aug 14, 2017 at 04:56:53PM -0700, Darrick J. Wong wrote: > > The biggest complaint I have about this patch series is that we end up > > using separate transactions for each phase of the link/unlink > > operations. There's no guarantee that once we start the process of > > expand-dir, add-name-to-dir, log-inode-cores, expand-attr, > > add-pptr-to-attrs that we actually get all the way to the end of that > > process, particularly if we crash in the middle and have to recover the > > log coming back up. > > Yes, that's a problem that we had not been solved at the point in > time I wrote that patchset. After making the parent pointer > attributes be added and removed correctly, then next thing to do was > to make them atomic and replayable. And that's the bit I never got > to. > > > Obviously, back in 2014 when Dave started on these patches there was no > > defer_ops, so a lot of effort goes into passing parameters up and down > > the stack and fiddling with the transaction reservation type > > declarations in order to guarantee that we can shove everything into a > > single transaction. Now that we have a more general mechanism to split > > complex updates into a chain of smaller updates (and make log recovery > > pick up where we left off) it's time to figure out how to make attr (and > > maybe dir) updates a deferrable operation. > > Right. We've got the infrastructure now, but it's nasty to retrofit > to all the attr code. > > > This adds complexity to the log because we have to define an "attr add > > KEY:VALUE" and a "attr delete KEY" deferred op that hooks into the > > existing attr code. For parent pointers it's not a big deal to log > > values since they're not big, but for the general case this idea needs > > more thought, or a decision that we'll just keep doing things the same > > way. > > The problem with this is that it requires us to log the entire > attribute value, which could be 64k in size. That used to be a big > issue because log bandwidth was at a premium. That isn't so much a > problem now, so logging the attr value and getting rid of the > unlogged sync write for remote attr updates is the way I think we > should head. Just had another thought about this - if we rewrite any code to use a different set of transaction structures, we're going to need to use log incompat flags to identify each of the changed transaction sets that require redo item support to replay. That's not a huge deal - it just means that people taking filesystems back to old kernels will need to make sure the log is clean. i.e. mount sets, unmount/remount-ro clears the relevant log incompat flags. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html