Hi Trond, On Thu, Apr 9, 2020 at 3:16 PM Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > Hi Olga, > > On Thu, 2020-04-09 at 13:14 -0400, Olga Kornievskaia wrote: > > Hi folks, > > > > This is a rename on an NFS mount but the stack trace is not in NFS, > > but I'm curious if any body ran into this. Thanks. > > > > Apr 7 13:34:53 scspr1865142002 kernel: Not tainted 5.5.7 #1 > > Apr 7 13:34:53 scspr1865142002 kernel: "echo 0 > > > /proc/sys/kernel/hung_task_timeout_secs" disables this message. > > Apr 7 13:34:53 scspr1865142002 kernel: dt D 0 24788 > > 24323 0x00000080 > > Apr 7 13:34:53 scspr1865142002 kernel: Call Trace: > > Apr 7 13:34:53 scspr1865142002 kernel: ? __schedule+0x2ca/0x6e0 > > Apr 7 13:34:53 scspr1865142002 kernel: schedule+0x4a/0xb0 > > Apr 7 13:34:53 scspr1865142002 kernel: > > schedule_preempt_disabled+0xa/0x10 > > Apr 7 13:34:53 scspr1865142002 kernel: > > __mutex_lock.isra.11+0x233/0x4e0 > > Apr 7 13:34:53 scspr1865142002 kernel: ? > > strncpy_from_user+0x47/0x160 > > Apr 7 13:34:53 scspr1865142002 kernel: lock_rename+0x28/0xd0 > > Apr 7 13:34:53 scspr1865142002 kernel: do_renameat2+0x1e7/0x4f0 > > Apr 7 13:34:53 scspr1865142002 kernel: __x64_sys_rename+0x1c/0x20 > > Apr 7 13:34:53 scspr1865142002 kernel: do_syscall_64+0x5b/0x200 > > Apr 7 13:34:53 scspr1865142002 kernel: > > entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > Apr 7 13:34:53 scspr1865142002 kernel: RIP: 0033:0x7f747a10ac77 > > Apr 7 13:34:53 scspr1865142002 kernel: Code: Bad RIP value. > > Apr 7 13:34:53 scspr1865142002 kernel: RSP: 002b:00007f7479f92948 > > EFLAGS: 00000206 ORIG_RAX: 0000000000000052 > > Apr 7 13:34:53 scspr1865142002 kernel: RAX: ffffffffffffffda RBX: > > 00000000023604c0 RCX: 00007f747a10ac77 > > Apr 7 13:34:53 scspr1865142002 kernel: RDX: 0000000000000000 RSI: > > 00007f7479f94a80 RDI: 00007f7479f96b80 > > Apr 7 13:34:53 scspr1865142002 kernel: RBP: 0000000000000005 R08: > > 00007f7479f9d700 R09: 00007f7479f9d700 > > Apr 7 13:34:53 scspr1865142002 kernel: R10: 645f72656464616c R11: > > 0000000000000206 R12: 0000000000000001 > > Apr 7 13:34:53 scspr1865142002 kernel: R13: 00007f7479f98c80 R14: > > 00007f7479f9ad80 R15: 00007f7479f94a80 > > It looks like the rename locking (i.e. taking the inode mutex on the > source and target directory) is hung. That likely indicates that > something else is leaking or holding onto one or more of the directory > mutexes. Is some other thread/process perhaps also hung on the same > directory? Thanks for the reply. I see several hung application processes with the same stack. Question now is there some NFS rename that's maybe hanging because server isn't replying (but I would think in that case I'd get a stack with a hung somewhere in NFS and there isn't one). This is also with nconnect so not sure if that has any effect on this. > > Cheers > Trond > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@xxxxxxxxxxxxxxx > >