> On May 4, 2016, at 9:14 AM, Anna Schumaker <Anna.Schumaker@xxxxxxxxxx> wrote: > > Hi Dros, > > On 05/03/2016 03:39 PM, Weston Andros Adamson wrote: >> Since the patch "NFS: Allow multiple commit requests in flight per file" >> we can run multiple simultaneous commits on the same inode. This >> introduced a race over collecting pages to commit that made it possible >> to call nfs_init_commit() with an empty list - which causes crashes like >> the one below. >> >> The fix is to catch this race and avoid calling nfs_init_commit and >> initiate_commit when there is no work to do. >> >> Here is the crash: >> >> [600522.076832] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040 >> [600522.078475] IP: [<ffffffffa0479e72>] nfs_init_commit+0x22/0x130 [nfs] >> [600522.078745] PGD 4272b1067 PUD 4272cb067 PMD 0 >> [600522.078972] Oops: 0000 [#1] SMP >> [600522.079204] Modules linked in: nfsv3 nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dcdbas ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw vmw_vsock_vmci_transport vsock bonding ipmi_devintf ipmi_msghandler coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ppdev vmw_balloon parport_pc parport acpi_cpufreq vmw_vmci i2c_piix4 shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c vmwgfx drm_kms_helper ttm drm crc32c_intel serio_raw vmxnet3 >> [600522.081380] vmw_pvscsi ata_generic pata_acpi >> [600522.081809] CPU: 3 PID: 15667 Comm: /usr/bin/python Not tainted 4.1.9-100.pd.88.el7.x86_64 #1 >> [600522.082281] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014 >> [600522.082814] task: ffff8800bbbfa780 ti: ffff88042ae84000 task.ti: ffff88042ae84000 >> [600522.083378] RIP: 0010:[<ffffffffa0479e72>] [<ffffffffa0479e72>] nfs_init_commit+0x22/0x130 [nfs] >> [600522.083973] RSP: 0018:ffff88042ae87438 EFLAGS: 00010246 >> [600522.084571] RAX: 0000000000000000 RBX: ffff880003485e40 RCX: ffff88042ae87588 >> [600522.085188] RDX: 0000000000000000 RSI: ffff88042ae874b0 RDI: ffff880003485e40 >> [600522.085756] RBP: ffff88042ae87448 R08: ffff880003486010 R09: ffff88042ae874b0 >> [600522.086332] R10: 0000000000000000 R11: 0000000000000005 R12: ffff88042ae872d0 >> [600522.086905] R13: ffff88042ae874b0 R14: ffff880003485e40 R15: ffff88042704c840 >> [600522.087484] FS: 00007f4728ff2740(0000) GS:ffff88043fd80000(0000) knlGS:0000000000000000 >> [600522.088070] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b >> [600522.088663] CR2: 0000000000000040 CR3: 000000042b6aa000 CR4: 00000000001406e0 >> [600522.089327] Stack: >> [600522.089926] 0000000000000001 ffff88042ae87588 ffff88042ae874f8 ffffffffa04f09fa >> [600522.090549] 0000000000017840 0000000000017840 ffff88042ae87588 ffff8803258d9930 >> [600522.091169] ffff88042ae87578 ffffffffa0563d80 0000000000000000 ffff88042704c840 >> [600522.091789] Call Trace: >> [600522.092420] [<ffffffffa04f09fa>] pnfs_generic_commit_pagelist+0x1da/0x320 [nfsv4] >> [600522.093052] [<ffffffffa0563d80>] ? ff_layout_commit_prepare_v3+0x30/0x30 [nfs_layout_flexfiles] >> [600522.093696] [<ffffffffa0562645>] ff_layout_commit_pagelist+0x15/0x20 [nfs_layout_flexfiles] >> [600522.094359] [<ffffffffa047bc78>] nfs_generic_commit_list+0xe8/0x120 [nfs] >> [600522.095032] [<ffffffffa047bd6a>] nfs_commit_inode+0xba/0x110 [nfs] >> [600522.095719] [<ffffffffa046ac54>] nfs_release_page+0x44/0xd0 [nfs] >> [600522.096410] [<ffffffff811a8122>] try_to_release_page+0x32/0x50 >> [600522.097109] [<ffffffff811bd4f1>] shrink_page_list+0x961/0xb30 >> [600522.097812] [<ffffffff811bdced>] shrink_inactive_list+0x1cd/0x550 >> [600522.098530] [<ffffffff811bea65>] shrink_lruvec+0x635/0x840 >> [600522.099250] [<ffffffff811bed60>] shrink_zone+0xf0/0x2f0 >> [600522.099974] [<ffffffff811bf312>] do_try_to_free_pages+0x192/0x470 >> [600522.100709] [<ffffffff811bf6ca>] try_to_free_pages+0xda/0x170 >> [600522.101464] [<ffffffff811b2198>] __alloc_pages_nodemask+0x588/0x970 >> [600522.102235] [<ffffffff811fbbd5>] alloc_pages_vma+0xb5/0x230 >> [600522.103000] [<ffffffff813a1589>] ? cpumask_any_but+0x39/0x50 >> [600522.103774] [<ffffffff811d6115>] wp_page_copy.isra.55+0x95/0x490 >> [600522.104558] [<ffffffff810e3438>] ? __wake_up+0x48/0x60 >> [600522.105357] [<ffffffff811d7d3b>] do_wp_page+0xab/0x4f0 >> [600522.106137] [<ffffffff810a1bbb>] ? release_task+0x36b/0x470 >> [600522.106902] [<ffffffff8126dbd7>] ? eventfd_ctx_read+0x67/0x1c0 >> [600522.107659] [<ffffffff811da2a8>] handle_mm_fault+0xc78/0x1900 >> [600522.108431] [<ffffffff81067ef1>] __do_page_fault+0x181/0x420 >> [600522.109173] [<ffffffff811446a6>] ? __audit_syscall_exit+0x1e6/0x280 >> [600522.109893] [<ffffffff810681c0>] do_page_fault+0x30/0x80 >> [600522.110594] [<ffffffff81024f36>] ? syscall_trace_leave+0xc6/0x120 >> [600522.111288] [<ffffffff81790a58>] page_fault+0x28/0x30 >> [600522.111947] Code: 5d c3 0f 1f 80 00 00 00 00 0f 1f 44 00 00 55 4c 8d 87 d0 01 00 00 48 89 e5 53 48 89 fb 48 83 ec 08 4c 8b 0e 49 8b 41 18 4c 39 ce <48> 8b 40 40 4c 8b 50 30 74 24 48 8b 87 d0 01 00 00 48 8b 7e 08 >> [600522.113343] RIP [<ffffffffa0479e72>] nfs_init_commit+0x22/0x130 [nfs] >> [600522.114003] RSP <ffff88042ae87438> >> [600522.114636] CR2: 0000000000000040 >> >> Signed-off-by: Weston Andros Adamson <dros@xxxxxxxxxxxxxxx> >> --- >> fs/nfs/pnfs_nfs.c | 17 +++++++++++++++++ >> fs/nfs/write.c | 4 ++++ >> 2 files changed, 21 insertions(+) >> >> diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c >> index 4aaed890048f..befde63026a6 100644 >> --- a/fs/nfs/pnfs_nfs.c >> +++ b/fs/nfs/pnfs_nfs.c >> @@ -280,6 +280,14 @@ pnfs_generic_commit_pagelist(struct inode *inode, struct list_head *mds_pages, >> list_for_each_entry_safe(data, tmp, &list, pages) { >> list_del_init(&data->pages); >> if (data->ds_commit_index < 0) { >> + /* another commit raced with us */ >> + if (list_empty(mds_pages)) { >> + if (atomic_dec_and_test(&cinfo->mds->rpcs_out)) >> + wake_up_atomic_t(&cinfo->mds->rpcs_out); >> + nfs_commitdata_release(data); >> + continue; >> + } >> + >> nfs_init_commit(data, mds_pages, NULL, cinfo); >> nfs_initiate_commit(NFS_CLIENT(inode), data, >> NFS_PROTO(data->inode), >> @@ -288,6 +296,15 @@ pnfs_generic_commit_pagelist(struct inode *inode, struct list_head *mds_pages, >> LIST_HEAD(pages); >> >> pnfs_fetch_commit_bucket_list(&pages, data, cinfo); >> + >> + /* another commit raced with us */ >> + if (list_empty(&pages)) { >> + if (atomic_dec_and_test(&cinfo->mds->rpcs_out)) >> + wake_up_atomic_t(&cinfo->mds->rpcs_out); >> + nfs_commitdata_release(data); >> + continue; >> + } >> + > > Is there any way to do this without the code duplication in this function? > > Thanks, > Anna I can move the “if empty, then release reference” code out to a helper function that returns a bool, but both blocks will need to call it and continue if the list was empty (and ref released). This has to be called twice because the list it’s inspecting in the second block is built right before (pnfs_fetch_commit_bucket_list). Not a huge reduction in line count, but I can do that if you’d prefer. I’ll have to feed through Primary Data QA again, I’ve found this race somewhat difficult to reproduce. -dros > >> nfs_init_commit(data, &pages, data->lseg, cinfo); >> initiate_commit(data, how); >> } >> diff --git a/fs/nfs/write.c b/fs/nfs/write.c >> index 5f4fd53e5764..f5e613395fc2 100644 >> --- a/fs/nfs/write.c >> +++ b/fs/nfs/write.c >> @@ -1709,6 +1709,10 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how, >> { >> struct nfs_commit_data *data; >> >> + /* another commit raced with us */ >> + if (list_empty(head)) >> + return 0; >> + >> data = nfs_commitdata_alloc(); >> >> if (!data) >> > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html