Re: VFS rename hang

Olga Kornievskaia <aglo@xxxxxxxxx> · Thu, 9 Apr 2020 16:15:20 -0400

Hi Trond,

On Thu, Apr 9, 2020 at 3:16 PM Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
>
> Hi Olga,
>
> On Thu, 2020-04-09 at 13:14 -0400, Olga Kornievskaia wrote:
> > Hi folks,
> >
> > This is a rename on an NFS mount but the stack trace is not in NFS,
> > but I'm curious if any body ran into this. Thanks.
> >
> > Apr  7 13:34:53 scspr1865142002 kernel:      Not tainted 5.5.7 #1
> > Apr  7 13:34:53 scspr1865142002 kernel: "echo 0 >
> > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > Apr  7 13:34:53 scspr1865142002 kernel: dt              D    0 24788
> > 24323 0x00000080
> > Apr  7 13:34:53 scspr1865142002 kernel: Call Trace:
> > Apr  7 13:34:53 scspr1865142002 kernel: ? __schedule+0x2ca/0x6e0
> > Apr  7 13:34:53 scspr1865142002 kernel: schedule+0x4a/0xb0
> > Apr  7 13:34:53 scspr1865142002 kernel:
> > schedule_preempt_disabled+0xa/0x10
> > Apr  7 13:34:53 scspr1865142002 kernel:
> > __mutex_lock.isra.11+0x233/0x4e0
> > Apr  7 13:34:53 scspr1865142002 kernel: ?
> > strncpy_from_user+0x47/0x160
> > Apr  7 13:34:53 scspr1865142002 kernel: lock_rename+0x28/0xd0
> > Apr  7 13:34:53 scspr1865142002 kernel: do_renameat2+0x1e7/0x4f0
> > Apr  7 13:34:53 scspr1865142002 kernel: __x64_sys_rename+0x1c/0x20
> > Apr  7 13:34:53 scspr1865142002 kernel: do_syscall_64+0x5b/0x200
> > Apr  7 13:34:53 scspr1865142002 kernel:
> > entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > Apr  7 13:34:53 scspr1865142002 kernel: RIP: 0033:0x7f747a10ac77
> > Apr  7 13:34:53 scspr1865142002 kernel: Code: Bad RIP value.
> > Apr  7 13:34:53 scspr1865142002 kernel: RSP: 002b:00007f7479f92948
> > EFLAGS: 00000206 ORIG_RAX: 0000000000000052
> > Apr  7 13:34:53 scspr1865142002 kernel: RAX: ffffffffffffffda RBX:
> > 00000000023604c0 RCX: 00007f747a10ac77
> > Apr  7 13:34:53 scspr1865142002 kernel: RDX: 0000000000000000 RSI:
> > 00007f7479f94a80 RDI: 00007f7479f96b80
> > Apr  7 13:34:53 scspr1865142002 kernel: RBP: 0000000000000005 R08:
> > 00007f7479f9d700 R09: 00007f7479f9d700
> > Apr  7 13:34:53 scspr1865142002 kernel: R10: 645f72656464616c R11:
> > 0000000000000206 R12: 0000000000000001
> > Apr  7 13:34:53 scspr1865142002 kernel: R13: 00007f7479f98c80 R14:
> > 00007f7479f9ad80 R15: 00007f7479f94a80
>
> It looks like the rename locking (i.e. taking the inode mutex on the
> source and target directory) is hung. That likely indicates that
> something else is leaking or holding onto one or more of the directory
> mutexes. Is some other thread/process perhaps also hung on the same
> directory?

Thanks for the reply. I see several hung application processes with
the same stack. Question now is there some NFS rename that's maybe
hanging because server isn't replying (but I would think in that case
I'd get a stack with a hung somewhere in NFS and there isn't one).
This is also with nconnect so not sure if that has any effect on this.

>
> Cheers
>   Trond
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@xxxxxxxxxxxxxxx
>
>