Re: current->journal_info got nested! (was Re: [syzbot] [xfs?] [ext4?] general protection fault in jbd2__journal_start)

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 31 Jan 2024 17:02:25 +1100

On Wed, Jan 31, 2024 at 05:20:10AM +0000, Matthew Wilcox wrote:
> On Tue, Jan 30, 2024 at 11:58:22PM -0500, Theodore Ts'o wrote:
> > Hmm, could XFS pre-fault target memory buffer for the bulkstat output
> > before starting its transaction?  Alternatively, ext4 could do a save
> > of current->journal_info before starting to process the page fault,
> > and restore it when it is done.  Both of these seem a bit hacky, and
> > the question is indeed, are there other avenues that might cause the
> > transaction context nesting, such that a more general solution is
> > called for?
> 
> I'd suggest that saving off current->journal_info is risky because
> it might cover a real problem where you've taken a pagefault inside
> a transaction (eg ext4 faulting while in the middle of a transaction on
> the same filesystem that contains the faulting file).

Depends. Look at it from the POV of XFS:

1. we can identify current->journal_info as belonging to XFS because
of the TRAN magic number in the first 32 bits of the structure.

2. the struct xfs_trans has a pointer to the xfs_mount which points
to the VFS superblock, which means we can identify exactly which
filesystem instance the transaction belongs to.

3. We can determine if the transaction has a journal reservation
from the log ticket the transaction holds.

>From these three things, we can identify if we are about to recurse
into the same filesystem, and if so, determine if it is safe to run
new transactions from within that context.

> Seems to me that we shouldn't be writing to userspace while in the
> middle of a transaction.  We could even assert that in copy_to_user()?

On further thinking I don't think the problem is taking page faults
within a filesystem transaction context is the problem here. We're
using the transaction context for automated garbage collection so we
never have to care about leaking object references in the code, not
because we are running a modification and holding a journal
reservation. It's the journals reservations that cannot be allowed
to nest in XFS, not the transactions themselves.

IOWs, the real problem is that multiple filesystems use the same
task field for their own individual purposes, and that then leads
these sorts of recursion problems when a task jumps from one
filesystem context to another and the task field is then
mis-interpretted.

I've had a look at the XFS usage of current->journal_info, and I
think we can just remove it.

It's used for warning that we are running a writepages operations
from within transaction context (which should never happen from XFS
nor memory reclaim). it's also used in the ->destroy_inode path to
determine if we are running in a context where we cannot block on
filesystem locks or transaction reservations. 

The former we can remove with no loss, the latter we can simply
check PF_MEMALLOC_NOFS as that is set by transaction contexts.

IOWs, I think that, in general, page faults in user buffers are fine
in empty XFS transaction contexts as long as we aren't using some
structure that some other filesystem might then to process the
page fault....

This may not be true for other filesystems, but I don't think we
can really say "page faults in any filesystem transaction are unsafe
and should be banned"....

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx