On Thu, 2020-04-09 at 16:15 -0400, Olga Kornievskaia wrote: > Hi Trond, > > On Thu, Apr 9, 2020 at 3:16 PM Trond Myklebust < > trondmy@xxxxxxxxxxxxxxx> wrote: > > Hi Olga, > > > > On Thu, 2020-04-09 at 13:14 -0400, Olga Kornievskaia wrote: > > > Hi folks, > > > > > > This is a rename on an NFS mount but the stack trace is not in > > > NFS, > > > but I'm curious if any body ran into this. Thanks. > > > > > > Apr 7 13:34:53 scspr1865142002 kernel: Not tainted 5.5.7 #1 > > > Apr 7 13:34:53 scspr1865142002 kernel: "echo 0 > > > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > > Apr 7 13:34:53 scspr1865142002 kernel: dt D 0 > > > 24788 > > > 24323 0x00000080 > > > Apr 7 13:34:53 scspr1865142002 kernel: Call Trace: > > > Apr 7 13:34:53 scspr1865142002 kernel: ? __schedule+0x2ca/0x6e0 > > > Apr 7 13:34:53 scspr1865142002 kernel: schedule+0x4a/0xb0 > > > Apr 7 13:34:53 scspr1865142002 kernel: > > > schedule_preempt_disabled+0xa/0x10 > > > Apr 7 13:34:53 scspr1865142002 kernel: > > > __mutex_lock.isra.11+0x233/0x4e0 > > > Apr 7 13:34:53 scspr1865142002 kernel: ? > > > strncpy_from_user+0x47/0x160 > > > Apr 7 13:34:53 scspr1865142002 kernel: lock_rename+0x28/0xd0 > > > Apr 7 13:34:53 scspr1865142002 kernel: do_renameat2+0x1e7/0x4f0 > > > Apr 7 13:34:53 scspr1865142002 kernel: > > > __x64_sys_rename+0x1c/0x20 > > > Apr 7 13:34:53 scspr1865142002 kernel: do_syscall_64+0x5b/0x200 > > > Apr 7 13:34:53 scspr1865142002 kernel: > > > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > Apr 7 13:34:53 scspr1865142002 kernel: RIP: 0033:0x7f747a10ac77 > > > Apr 7 13:34:53 scspr1865142002 kernel: Code: Bad RIP value. > > > Apr 7 13:34:53 scspr1865142002 kernel: RSP: > > > 002b:00007f7479f92948 > > > EFLAGS: 00000206 ORIG_RAX: 0000000000000052 > > > Apr 7 13:34:53 scspr1865142002 kernel: RAX: ffffffffffffffda > > > RBX: > > > 00000000023604c0 RCX: 00007f747a10ac77 > > > Apr 7 13:34:53 scspr1865142002 kernel: RDX: 0000000000000000 > > > RSI: > > > 00007f7479f94a80 RDI: 00007f7479f96b80 > > > Apr 7 13:34:53 scspr1865142002 kernel: RBP: 0000000000000005 > > > R08: > > > 00007f7479f9d700 R09: 00007f7479f9d700 > > > Apr 7 13:34:53 scspr1865142002 kernel: R10: 645f72656464616c > > > R11: > > > 0000000000000206 R12: 0000000000000001 > > > Apr 7 13:34:53 scspr1865142002 kernel: R13: 00007f7479f98c80 > > > R14: > > > 00007f7479f9ad80 R15: 00007f7479f94a80 > > > > It looks like the rename locking (i.e. taking the inode mutex on > > the > > source and target directory) is hung. That likely indicates that > > something else is leaking or holding onto one or more of the > > directory > > mutexes. Is some other thread/process perhaps also hung on the same > > directory? > > Thanks for the reply. I see several hung application processes with > the same stack. Question now is there some NFS rename that's maybe > hanging because server isn't replying (but I would think in that case > I'd get a stack with a hung somewhere in NFS and there isn't one). > This is also with nconnect so not sure if that has any effect on > this. This is happening on the client side, right? I wouldn't worry too much about the fact that these stack traces end in the VFS layer. It may still be trying to lock inodes from NFS. However, yes, I would normally expect there to be at least some other process or thread stuck in the NFS layer. If this is NFSv4, is there perhaps a recovery thread stuck trying to run? You might want to grep for nfs4_begin_drain_session in /proc/*/stack -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx