On Mon, Sep 18, 2023 at 08:33:35PM -0700, Darrick J. Wong wrote: > On Mon, Sep 18, 2023 at 03:48:38PM +1000, Dave Chinner wrote: > > It is only when we are trying to modify something that corruption > > becomes a problem with fatal consequences. Once we've made a > > modification, the in-memory state is different to the on-disk state > > and whilst we are in that state any corruption we discover becomes > > fatal. That is because there is no way to reconcile the changes > > we've already made in memory with what is on-disk - we don't know > > that the in-memory changes are good because we tripped over > > corruption, and so we must not propagate bad in-memory state and > > metadata to disk over the top of what may be still be uncorrupted > > metadata on disk. > > It'd be a massive effort, but wouldn't it be fun if one could attach > defer ops to a transaction that updated incore state on commit but > otherwise never appeared on disk? > > Let me cogitate on that during part 2 of vacation... Sure, I'm interested to see what you might come up with. My thoughts on rollback of dirty transactions come from a different perspective. Conceptually being able to roll back individual transactions isn't that difficult. All it takes is a bit more memory and CPU - when we join the item to the transaction we take a copy of the item we are about to modify. If we then cancel a dirty transaction, we then roll back all the dirty items to their original state before we unlock them. This works fine for all the on-disk stuff we track in log items. I have vague thoughts about how this could potentially be tied into the shadow buffers we already use for keeping a delta copy of all the committed in-memory changes in the CIL that we haven't yet committed to the journal - that's actually the entire delta between what is on disk and what we've changed prior to the current transaction we are cancelling. Hence, in theory, a rollback for a dirty log item is simply "read it from disk again, copy the CIL shadow buffer delta into it". However, the complexity comes with trying to roll back associated in-memory state changes that we don't track as log items. e.g. incore extent list changes, in memory inode flag state (e.g. XFS_ISTALE), etc. that's where all the hard problems to solve lie, I think. Another problem is how do we rollback from the middle of an intent (defer ops) chain? We have to complete that chain for things to end up consistent on disk, so we can't just cancel the current transaction and say we are done and everything is clean. Maybe that's what you are thinking of here - each chain has an "undo" intent chain that can roll back all the changes already made? Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx