Re: [PATCH v3] xfs: introduce protection for drop nlink

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 20 Sep 2023 15:53:40 +1000

On Mon, Sep 18, 2023 at 08:33:35PM -0700, Darrick J. Wong wrote:
> On Mon, Sep 18, 2023 at 03:48:38PM +1000, Dave Chinner wrote:
> > It is only when we are trying to modify something that corruption
> > becomes a problem with fatal consequences. Once we've made a
> > modification, the in-memory state is different to the on-disk state
> > and whilst we are in that state any corruption we discover becomes
> > fatal. That is because there is no way to reconcile the changes
> > we've already made in memory with what is on-disk - we don't know
> > that the in-memory changes are good because we tripped over
> > corruption, and so we must not propagate bad in-memory state and
> > metadata to disk over the top of what may be still be uncorrupted
> > metadata on disk.
> 
> It'd be a massive effort, but wouldn't it be fun if one could attach
> defer ops to a transaction that updated incore state on commit but
> otherwise never appeared on disk?
>
> Let me cogitate on that during part 2 of vacation...

Sure, I'm interested to see what you might come up with.

My thoughts on rollback of dirty transactions come from a different
perspective.

Conceptually being able to roll back individual transactions isn't
that difficult. All it takes is a bit more memory and CPU - when we
join the item to the transaction we take a copy of the item we are
about to modify.

If we then cancel a dirty transaction, we then roll back all the
dirty items to their original state before we unlock them.  This
works fine for all the on-disk stuff we track in log items.

I have vague thoughts about how this could potentially be tied into
the shadow buffers we already use for keeping a delta copy of all
the committed in-memory changes in the CIL that we haven't yet
committed to the journal - that's actually the entire delta between
what is on disk and what we've changed prior to the current
transaction we are cancelling.

Hence, in theory, a rollback for a dirty log item is simply "read it
from disk again, copy the CIL shadow buffer delta into it".

However, the complexity comes with trying to roll back associated
in-memory state changes that we don't track as log items.  e.g.
incore extent list changes, in memory inode flag state (e.g.
XFS_ISTALE), etc. that's where all the hard problems to solve lie, I
think.

Another problem is how do we rollback from the middle of an intent
(defer ops) chain? We have to complete that chain for things to end
up consistent on disk, so we can't just cancel the current
transaction and say we are done and everything is clean.  Maybe
that's what you are thinking of here - each chain has an "undo"
intent chain that can roll back all the changes already made?

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx