On Fri, 10 Jul 2015 11:29:10 +0200 William Dauchy <william@xxxxxxxxx> wrote: > Hello, > > We have been testing the two following patches on top of the last 3.14.x. > (they have been queued up for stable releases) > > commit db2efec0caba4f81a22d95a34da640b86c313c8e > Author: Jeff Layton <jlayton@xxxxxxxxxxxxxxx> > Date: Tue Jun 30 14:12:30 2015 -0400 > > nfs: take extra reference to fl->fl_file when running a LOCKU operation > > commit feaff8e5b2cfc3eae02cf65db7a400b0b9ffc596 > Author: Jeff Layton <jlayton@xxxxxxxxxxxxxxx> > Date: Tue May 12 15:48:10 2015 -0400 > > nfs: take extra reference to fl->fl_file when running a setlk > > > It resulted in random instabilities; we are unable to reproduce it reliably for now; > the only trace we got was the one below. > > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 8 at kernel/rcu/tree.c:2191 rcu_do_batch.isra.51+0x384/0x3d0() > CPU: 0 PID: 8 Comm: rcuc/0 Not tainted 3.14 #1 > 0000000000000009 ffff88061e5dfd10 ffffffff815f5143 0000000000000000 > ffff88061e5dfd48 ffffffff8105eece 000000000000002f ffff880627c0b600 > 000000000000002f 0000000000000246 0000000000000000 ffff88061e5dfd58 > Call Trace: > [<ffffffff815f5143>] dump_stack+0x4d/0x81 > [<ffffffff8105eece>] warn_slowpath_common+0x6e/0x90 > [<ffffffff8105efd5>] warn_slowpath_null+0x15/0x20 > [<ffffffff810bc634>] rcu_do_batch.isra.51+0x384/0x3d0 > [<ffffffff810bc42a>] ? rcu_do_batch.isra.51+0x17a/0x3d0 > [<ffffffff810bc9ed>] rcu_cpu_kthread+0xed/0x130 > [<ffffffff8108aabe>] smpboot_thread_fn+0x18e/0x2e0 > [<ffffffff8108a930>] ? in_egroup_p+0x40/0x40 > [<ffffffff8108358c>] kthread+0xec/0x110 > [<ffffffff810834a0>] ? __kthread_parkme+0x80/0x80 > [<ffffffff815fcb39>] ret_from_fork+0x49/0x80 > [<ffffffff810834a0>] ? __kthread_parkme+0x80/0x80 > ---[ end trace 27f9589ec4225b03 ]--- Huh. I'm stumped... These patches are pretty straightforward. We're just taking an extra reference to the filp when running lock operations so that it doesn't disappear before the replies can be processed (typically in the event that a signal comes in while waiting on the reply). Given the odd stack trace above, I have to wonder if there's some sort of memory scribble going on. Just to be clear...you are mounting with NFSv4 and running something on the mount when you see this, right? If you don't use NFSv4, then is everything fine? Thanks, -- Jeff Layton <jlayton@xxxxxxxxxxxxxxx>
Attachment:
pgpztCP95MS02.pgp
Description: OpenPGP digital signature