Re: [PATCH] kfence: Avoid stalling work queue task without allocations

Steven Rostedt <rostedt@xxxxxxxxxxx> · Mon, 23 Nov 2020 11:28:12 -0500

On Mon, 23 Nov 2020 16:27:20 +0100
Marco Elver <elver@xxxxxxxxxx> wrote:

> On Fri, Nov 20, 2020 at 02:27PM -0500, Steven Rostedt wrote:
> > On Thu, 19 Nov 2020 13:53:57 +0100
> > Marco Elver <elver@xxxxxxxxxx> wrote:
> >   
> > > Running tests again, along with the function tracer
> > > Running tests on all trace events:
> > > Testing all events: 
> > > BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 12s!  
> > 
> > The below patch might be noisy, but can you add it to the kernel that
> > crashes and see if a particular event causes the issue?
> > 
> > [ note I didn't even compile test. I hope it works ;) ]
> > 
> > Perhaps run it a couple of times to see if it crashes on the same set of
> > events each time.  
> 
> Thanks! I have attached the logs of 2 runs. I think one problem here is
> that the enabling of an event doesn't immediately trigger the problem,
> so it's hard to say which one caused it.
> 

I noticed:

[  237.650900] enabling event benchmark_event

In both traces. Could you disable CONFIG_TRACEPOINT_BENCHMARK and see if
the issue goes away. That event kicks off a thread that spins in a tight
loop for some time and could possibly cause some issues.

It still shouldn't break things, we can narrow it down if it is the culprit.

-- Steve