On Sun, 29 Sep 2024, Harald Dunkel wrote: > Hi Neil, > > On 2024-09-29 00:23:18, NeilBrown wrote: > > > > Thanks for the logs. The point to flush_workqueue() being a problem, > > presumably from nfsd4_probe_callback_sync(), though I'm not 100% sure of > > that. Maybe some deadlock in the callback code. I'm not very familiar > > with that code and nothing immediately jumps out. > > > > I had thought that hung_task_all_cpu_backtrace would show a backtrace of > > *all* tasks - I missed the "cpu" in there. > > If if it happens again and if you can > > echo t > /proc/sysrq-trigger > > to get stack traces of everything, that might help. Maybe it won't be > > necessary if I or someone else can spot a deadlock with > > flush_workqueue(). > > > > I just learned that kernel.hung_task_panic = 1 should have been set, too. > Sorry for that. Currently I have > > kernel.hung_task_panic = 1 > kernel.hung_task_all_cpu_backtrace = 1 > > Please confirm. You DON'T want hung_task_panic in this case. If the system panics, it might do so before the watchdog you mention below fires. We really need the sysrq-trigger output and the panic might interfere with that. Thanks, NeilBrown > > I have set a watchdog to run the sysrq trigger on the NFS server if df > on an NFS client doesn't respond. > > > Regards > Harri >