Re: [syzbot] [xfs?] INFO: task hung in __fdget_pos (4)

Al Viro <viro@xxxxxxxxxxxxxxxxxx> · Mon, 4 Sep 2023 04:02:33 +0100

On Mon, Sep 04, 2023 at 11:45:03AM +1000, Dave Chinner wrote:

> > thread B: write()
> > 	finds file
> > 	grabs ->f_pos_lock
> > 	calls into filesystem
> > 	blocks on fs lock held by A
> > thread C: read()/write()/lseek() on the same file
> > 	blocks on ->f_pos_lock
> 
> Yes, that's exactly what I said in a followup email - we need to
> know what happened to thread A, because that might be where we are
> stuck on a leaked lock.
> 
> I saw quite a few reports where lookup/readdir are also stuck trying
> to get an inode lock - those at the "thread B"s in the above example
> - but there's no indication left of what happened with thread A.
> 
> If thread A was blocked iall that time on something, then the hung
> task timer should fire on it, too.  If it is running in a tight
> loop, the NMI would have dumped a stack trace from it.
> 
> But neither of those things happened, so it's either leaked
> something or it's in a loop with a short term sleep so doesn't
> trigger the hung task timer. sysrq-w output will capture that
> without all the noise of sysrq-t....

Here's what brought sysrq-t:

| > The report does not have info necessary to figure this out -- no
| > backtrace for whichever thread which holds f_pos_lock. I clicked on a
| > bunch of other reports and it is the same story.
| > 
| > Can the kernel be configured to dump backtraces from *all* threads?
| > 
| > If there is no feature like that I can hack it up.
|
| <break>t
|
| over serial console, or echo t >/proc/sysrq-trigger would do it...

A question specifically about getting the stack traces...