Re: [syzbot] [nfs?] INFO: task hung in nfsd_umount

Harald Dunkel <harri@xxxxxxxxx> · Sun, 29 Sep 2024 10:23:34 +0200

Hi Neil,

On 2024-09-29 00:23:18, NeilBrown wrote:

Thanks for the logs.  The point to flush_workqueue() being a problem,
presumably from nfsd4_probe_callback_sync(), though I'm not 100% sure of
that.  Maybe some deadlock in the callback code.  I'm not very familiar
with that code and nothing immediately jumps out.

I had thought that hung_task_all_cpu_backtrace would show a backtrace of
*all* tasks - I missed the "cpu" in there.
If if it happens again and if you can
   echo t > /proc/sysrq-trigger
to get stack traces of everything, that might help.  Maybe it won't be
necessary if I or someone else can spot a deadlock with
flush_workqueue().

I just learned that kernel.hung_task_panic = 1 should have been set, too.
Sorry for that. Currently I have

	kernel.hung_task_panic = 1
	kernel.hung_task_all_cpu_backtrace = 1

Please confirm.

I have set a watchdog to run the sysrq trigger on the NFS server if df
on an NFS client doesn't respond.

Regards
Harri