On Wed, Feb 21, 2024 at 03:25:36PM -0800, Darrick J. Wong wrote: > On Thu, Feb 22, 2024 at 09:47:23AM +1100, Dave Chinner wrote: > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > syzbot reported an ext4 panic during a page fault where found a > > journal handle when it didn't expect to find one. The structure > > it tripped over had a value of 'TRAN' in the first entry in the > > structure, and that indicates it tripped over a struct xfs_trans > > instead of a jbd2 handle. > > > > The reason for this is that the page fault was taken during a > > copy-out to a user buffer from an xfs bulkstat operation. XFS uses > > an "empty" transaction context for bulkstat to do automated metadata > > buffer cleanup, and so the transaction context is valid across the > > copyout of the bulkstat info into the user buffer. > > > > We are using empty transaction contexts like this in XFS in more > > places to reduce the risk of failing to release objects we reference > > during the operation, especially during error handling. Hence we > > really need to ensure that we can take page faults from these > > contexts without leaving landmines for the code processing the page > > fault to trip over. > > > > We really only use current->journal_info for a single warning check > > in xfs_vm_writepages() to ensure we aren't doing writeback from a > > transaction context. Writeback might need to do allocation, so it > > can need to run transactions itself. Hence it's a debug check to > > warn us that we've done something silly, and largely it is not all > > that useful. > > > > So let's just remove all the use of current->journal_info in XFS and > > get rid of all the potential issues from nested contexts where > > current->journal_info might get misused by another filesytsem > > context. > > I wonder if this is too much, though? We ran XFS for 15+ years without setting current->journal_info, so I don't see it as a necessary feature... > Conceptually, I'd rather we set current->journal_info to some random > number whenever we start a !NO_WRITECOUNT (aka a non-empty) transaction > and clear it during transaction commit/cancel. That way, *we* can catch > the case where some other filesystem starts a transaction and > accidentally bounces into an updating write fault inside XFS. I could just leave the ASSERT(current->journal_info == NULL); in xfs_trans_set_context() and we would catch that case. But, really, having a page fault from some other filesystem bounce into XFS where we then have to run a transaction isn't a bug at all. The problem is purely that we now have two different contexts that now think they own current->journal_info. IOWs, no filesystem can allow page faults while current->journal_info is set by the filesystem because the page fault processing might use current->journal_info itself. If we end up with nested XFS transactions from a page fault whilst holding an empty transaction, then it isn't an issue as the outer transaction does not hold a log reservation. The only problem that might occur is a deadlock if the page fault tries to take the same locks the upper context holds, but that's not a problem that setting current->journal_info would solve, anyway... Hence if XFS doesn't use current->journal_info, then we just don't care what context we are running the transaction in, above or below the fault. > That might be outweighed by the weird semantics of ext4 where the > zeroness of journal_info changes its behavior in ways I don't want to > understand. Thoughts? That's exactly the problem we're trying to avoid. Either we don't allow faults in (empty) transaction context, or we don't use current->journal_info. I prefer the latter as it gives us much more implementation freedom with empty transaction contexts.. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx