On Wed, Oct 19, 2022 at 6:35 PM Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > > A report from a tester with this call trace: > > watchdog: BUG: soft lockup - CPU#127 stuck for 134s! [ksoftirqd/127:782] > RIP: 0010:_raw_spin_unlock_irqrestore+0x19/0x40 [..] Whee. > ...lead me to this thread. This was after I had them force all softirqs > to run in ksoftirqd context, and run with rq_affinity == 2 to force > I/O completion work to throttle new submissions. > > Willy, are these headed upstream: > > https://lore.kernel.org/all/YjSbHp6B9a1G3tuQ@xxxxxxxxxxxxxxxxxxxx > > ...or I am missing an alternate solution posted elsewhere? Can your reporter test that patch? I think it should still apply pretty much as-is.. And if we actually had somebody who had a test-case that was literally fixed by getting rid of the old bookmark code, that would make applying that patch a no-brainer. The problem is that the original load that caused us to do that thing in the first place isn't repeatable because it was special production code - so removing that bookmark code because we _think_ it now hurts more than it helps is kind of a big hurdle. But if we had some hard confirmation from somebody that "yes, the bookmark code is now hurting", that would make it a lot more palatable to just remove the code that we just _think_ that probably isn't needed any more.. Linus