Re: [PATCH] xfs: don't use current->journal_info

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 22 Feb 2024 12:10:32 +1100

On Wed, Feb 21, 2024 at 03:25:36PM -0800, Darrick J. Wong wrote:
> On Thu, Feb 22, 2024 at 09:47:23AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > 
> > syzbot reported an ext4 panic during a page fault where found a
> > journal handle when it didn't expect to find one. The structure
> > it tripped over had a value of 'TRAN' in the first entry in the
> > structure, and that indicates it tripped over a struct xfs_trans
> > instead of a jbd2 handle.
> > 
> > The reason for this is that the page fault was taken during a
> > copy-out to a user buffer from an xfs bulkstat operation. XFS uses
> > an "empty" transaction context for bulkstat to do automated metadata
> > buffer cleanup, and so the transaction context is valid across the
> > copyout of the bulkstat info into the user buffer.
> > 
> > We are using empty transaction contexts like this in XFS in more
> > places to reduce the risk of failing to release objects we reference
> > during the operation, especially during error handling. Hence we
> > really need to ensure that we can take page faults from these
> > contexts without leaving landmines for the code processing the page
> > fault to trip over.
> > 
> > We really only use current->journal_info for a single warning check
> > in xfs_vm_writepages() to ensure we aren't doing writeback from a
> > transaction context. Writeback might need to do allocation, so it
> > can need to run transactions itself. Hence it's a debug check to
> > warn us that we've done something silly, and largely it is not all
> > that useful.
> > 
> > So let's just remove all the use of current->journal_info in XFS and
> > get rid of all the potential issues from nested contexts where
> > current->journal_info might get misused by another filesytsem
> > context.
> 
> I wonder if this is too much, though?

We ran XFS for 15+ years without setting current->journal_info, so I
don't see it as a necessary feature...

> Conceptually, I'd rather we set current->journal_info to some random
> number whenever we start a !NO_WRITECOUNT (aka a non-empty) transaction
> and clear it during transaction commit/cancel.  That way, *we* can catch
> the case where some other filesystem starts a transaction and
> accidentally bounces into an updating write fault inside XFS.

I could just leave the ASSERT(current->journal_info == NULL); in
xfs_trans_set_context() and we would catch that case. But, really,
having a page fault from some other filesystem bounce into XFS where
we then have to run a transaction isn't a bug at all.

The problem is purely that we now have two different contexts that
now think they own current->journal_info. IOWs, no filesystem can 
allow page faults while current->journal_info is set by the
filesystem because the page fault processing might use
current->journal_info itself.

If we end up with nested XFS transactions from a page fault whilst
holding an empty transaction, then it isn't an issue as the outer
transaction does not hold a log reservation. The only problem that
might occur is a deadlock if the page fault tries to take the same
locks the upper context holds, but that's not a problem that setting
current->journal_info would solve, anyway...

Hence if XFS doesn't use current->journal_info, then we just don't
care what context we are running the transaction in, above or below
the fault.

> That might be outweighed by the weird semantics of ext4 where the
> zeroness of journal_info changes its behavior in ways I don't want to
> understand.  Thoughts?

That's exactly the problem we're trying to avoid. Either we don't
allow faults in (empty) transaction context, or we don't use
current->journal_info. I prefer the latter as it gives us much more
implementation freedom with empty transaction contexts..

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx