Re: [PATCH][stable] nfs: take extra reference to fl->fl_file when running a setlk

Luis Henriques <luis.henriques@xxxxxxxxxxxxx> · Mon, 6 Jul 2015 10:45:16 +0100

On Thu, Jul 02, 2015 at 06:40:17AM -0400, Jeff Layton wrote:
> From: Jeff Layton <jlayton@xxxxxxxxxxxxxxx>
> 
> We had a report of a crash while stress testing the NFS client:
> 
>     BUG: unable to handle kernel NULL pointer dereference at 0000000000000150
>     IP: [<ffffffff8127b698>] locks_get_lock_context+0x8/0x90
>     PGD 0
>     Oops: 0000 [#1] SMP
>     Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_filter ebtable_broute bridge stp llc ebtables ip6table_security ip6table_mangle ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_raw ip6table_filter ip6_tables iptable_security iptable_mangle iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_raw coretemp crct10dif_pclmul ppdev crc32_pclmul crc32c_intel ghash_clmulni_intel vmw_balloon serio_raw vmw_vmci i2c_piix4 shpchp parport_pc acpi_cpufreq parport nfsd auth_rpcgss nfs_acl lockd grace sunrpc vmwgfx drm_kms_helper ttm drm mptspi scsi_transport_spi mptscsih mptbase e1000 ata_generic pata_acpi
>     CPU: 1 PID: 399 Comm: kworker/1:1H Not tainted 4.1.0-0.rc1.git0.1.fc23.x86_64 #1
>     Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/30/2013
>     Workqueue: rpciod rpc_async_schedule [sunrpc]
>     task: ffff880036aea7c0 ti: ffff8800791f4000 task.ti: ffff8800791f4000
>     RIP: 0010:[<ffffffff8127b698>]  [<ffffffff8127b698>] locks_get_lock_context+0x8/0x90
>     RSP: 0018:ffff8800791f7c00  EFLAGS: 00010293
>     RAX: ffff8800791f7c40 RBX: ffff88001f2ad8c0 RCX: ffffe8ffffc80305
>     RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
>     RBP: ffff8800791f7c88 R08: ffff88007fc971d8 R09: 279656d600000000
>     R10: 0000034a01000000 R11: 279656d600000000 R12: ffff88001f2ad918
>     R13: ffff88001f2ad8c0 R14: 0000000000000000 R15: 0000000100e73040
>     FS:  0000000000000000(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
>     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     CR2: 0000000000000150 CR3: 0000000001c0b000 CR4: 00000000000407e0
>     Stack:
>      ffffffff8127c5b0 ffff8800791f7c18 ffffffffa0171e29 ffff8800791f7c58
>      ffffffffa0171ef8 ffff8800791f7c78 0000000000000246 ffff88001ea0ba00
>      ffff8800791f7c40 ffff8800791f7c40 00000000ff5d86a3 ffff8800791f7ca8
>     Call Trace:
>      [<ffffffff8127c5b0>] ? __posix_lock_file+0x40/0x760
>      [<ffffffffa0171e29>] ? rpc_make_runnable+0x99/0xa0 [sunrpc]
>      [<ffffffffa0171ef8>] ? rpc_wake_up_task_queue_locked.part.35+0xc8/0x250 [sunrpc]
>      [<ffffffff8127cd3a>] posix_lock_file_wait+0x4a/0x120
>      [<ffffffffa03e4f12>] ? nfs41_wake_and_assign_slot+0x32/0x40 [nfsv4]
>      [<ffffffffa03bf108>] ? nfs41_sequence_done+0xd8/0x2d0 [nfsv4]
>      [<ffffffffa03c116d>] do_vfs_lock+0x2d/0x30 [nfsv4]
>      [<ffffffffa03c251d>] nfs4_lock_done+0x1ad/0x210 [nfsv4]
>      [<ffffffffa0171a30>] ? __rpc_sleep_on_priority+0x390/0x390 [sunrpc]
>      [<ffffffffa0171a30>] ? __rpc_sleep_on_priority+0x390/0x390 [sunrpc]
>      [<ffffffffa0171a5c>] rpc_exit_task+0x2c/0xa0 [sunrpc]
>      [<ffffffffa0167450>] ? call_refreshresult+0x150/0x150 [sunrpc]
>      [<ffffffffa0172640>] __rpc_execute+0x90/0x460 [sunrpc]
>      [<ffffffffa0172a25>] rpc_async_schedule+0x15/0x20 [sunrpc]
>      [<ffffffff810baa1b>] process_one_work+0x1bb/0x410
>      [<ffffffff810bacc3>] worker_thread+0x53/0x480
>      [<ffffffff810bac70>] ? process_one_work+0x410/0x410
>      [<ffffffff810bac70>] ? process_one_work+0x410/0x410
>      [<ffffffff810c0b38>] kthread+0xd8/0xf0
>      [<ffffffff810c0a60>] ? kthread_worker_fn+0x180/0x180
>      [<ffffffff817a1aa2>] ret_from_fork+0x42/0x70
>      [<ffffffff810c0a60>] ? kthread_worker_fn+0x180/0x180
> 
> Jean says:
> 
> "Running locktests with a large number of iterations resulted in a
>  client crash.  The test run took a while and hasn't finished after close
>  to 2 hours. The crash happened right after I gave up and killed the test
>  (after 107m) with Ctrl+C."
> 
> The crash happened because a NULL inode pointer got passed into
> locks_get_lock_context. The call chain indicates that file_inode(filp)
> returned NULL, which means that f_inode was NULL. Since that's zeroed
> out in __fput, that suggests that this filp pointer outlived the last
> reference.
> 
> Looking at the code, that seems possible. We copy the struct file_lock
> that's passed in, but if the task is signalled at an inopportune time we
> can end up trying to use that file_lock in rpciod context after the process
> that requested it has already returned (and possibly put its filp
> reference).
> 
> Fix this by taking an extra reference to the filp when we allocate the
> lock info, and put it in nfs4_lock_release.
> 
> Reported-by: Jean Spector <jean@xxxxxxxxxxxxxxx>
> Signed-off-by: Jeff Layton <jeff.layton@xxxxxxxxxxxxxxx>
> Signed-off-by: Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx> # all stable-series kernels
> Upstream commit: feaff8e5b2cfc3eae02cf65db7a400b0b9ffc596

Thanks, I'm queuing it for the 3.16 kernel.

Cheers,
--
Luís

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html