On Mon, Feb 22, 2021 at 08:51:05PM -0800, Darrick J. Wong wrote: > On Tue, Feb 23, 2021 at 02:28:37PM +1100, Dave Chinner wrote: > > On Mon, Feb 22, 2021 at 06:15:57PM -0800, Darrick J. Wong wrote: > > > On Tue, Feb 23, 2021 at 10:31:07AM +1100, Dave Chinner wrote: > > > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > > > > > Because the iomap code using PF_MEMALLOC_NOFS to detect transaction > > > > recursion in XFS is just wrong. Remove it from the iomap code and > > > > replace it with XFS specific internal checks using > > > > current->journal_info instead. > > > > > > It might be worth mentioning that this changes the PF_MEMALLOC_NOFS > > > behavior very slightly -- it's now bound to the allocation and freeing > > > of the transaction, instead of the strange way we used to do this, where > > > we'd set it at reservation time but we don't /clear/ it at unreserve time. > > > > They are effectively the same thing, so I think you are splitting > > hairs here. The rule is "transaction context is NOFS" so whether it > > is set when the transaction context is entered or a few instructions > > later when we start the reservation is not significant. > > > > > This doesn't strictly look like a fix patch, but as it is a Dumb > > > Developer Detector(tm) I could try to push it for 5.12 ... or just make > > > it one of the first 5.13 patches. Any preference? > > > > Nope. You're going to need to fix the transaction nesting the new gc > > code does before applying this, though, because that is detected as > > transaction recursion by this patch.... > > Well yes, I was trying to see if I could throw in the fix patch and the > idiot detector, both at the same time... :) > > That said, it crashes in xfs/229: > > 2822 args->result = __xfs_btree_split(args->cur, args->level, args->ptrp, > 2823 args->key, args->curp, args->stat); > 2824 complete(args->done); > 2825 > > 2826 xfs_trans_clear_context(args->cur->bc_tp); > 2827 current_restore_flags_nested(&pflags, new_pflags); > > It's possible for the original wait_for_completion() in > xfs_btree_split() to wake up immediately after complete() drops the > lock. If it returns (and blows away the stack variable @args) before > the worker resumes, then the worker will be dereferencing freed stack > memory and blows up: Argh. So I left an undocumented landmine in that code that I then stepped on myself years later. Easy to fix, just clear the context before calling done. I'll go re-attach my leg, then update it and drop a comment in there, too. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx