On Fri, Aug 12, 2016 at 5:29 PM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > On Fri, Aug 12, 2016 at 5:17 PM, James Bottomley > <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: >> On Fri, 2016-08-12 at 14:29 -0700, Dan Williams wrote: >>> Before spending effort trying to flush the destruction of old bdi >>> instances before new ones are registered, is it rather time to >>> complete the conversion of sd to only use dynamically allocated devt? >> >> Do we have to go that far? Surely your fix is extensible: the only >> reason it doesn't work for us is that the gendisk holds the parent >> without a reference, so we can free the SCSI device before its child >> gendisk (good job no-one actually uses gendisk->parent after we've >> released it ...). If we fix that it would mean SCSI can't release the >> sdev until after the queue is dead and the bdi namespace released, so >> isn't something like this the easy fix? >> >> James >> >> --- >> >> diff --git a/block/genhd.c b/block/genhd.c >> index fcd6d4f..54ae4ae 100644 >> --- a/block/genhd.c >> +++ b/block/genhd.c >> @@ -514,7 +514,7 @@ static void register_disk(struct device *parent, struct gendisk *disk) >> struct hd_struct *part; >> int err; >> >> - ddev->parent = parent; >> + ddev->parent = get_device(parent); >> >> dev_set_name(ddev, "%s", disk->disk_name); >> >> @@ -1144,6 +1144,7 @@ static void disk_release(struct device *dev) >> hd_free_part(&disk->part0); >> if (disk->queue) >> blk_put_queue(disk->queue); >> + put_device(dev->parent); >> kfree(disk); >> } >> struct class block_class = { > > Looks ok at first glance to me. > > We do hold a reference on the parent device, but it gets dropped at > device_unregister() time and this moves it out to the final put. > However, this does leave static devt block-device-drivers that > register a disk without a parent device susceptible to the race... I > think those exist given all the drivers still using add_disk() after > commit 52c44d93c26f "block: remove ->driverfs_dev". So I tried the attached and it makes the libnvdimm unit tests start crashing. A couple crash logs attached. Not yet sure what assumption is getting violated, but how about that conversion of scsi to use dynamic devt? ;-)
[ 30.149817] ========================= [ 30.150729] [ BUG: held lock freed! ] [ 30.151644] 4.8.0-rc1+ #19 Tainted: G O [ 30.152717] ------------------------- [ 30.153635] lt-libndctl/863 is freeing memory ffff880310b7e128-ffff880310b7e927, with a lock still held there! [ 30.155719] (&dev->mutex){......}, at: [<ffffffff815f931d>] device_release_driver+0x1d/0x40 [ 30.157927] 6 locks held by lt-libndctl/863: [ 30.158917] #0: (sb_writers#7){.+.+.+}, at: [<ffffffff8128c537>] __sb_start_write+0xb7/0xf0 [ 30.161267] #1: (&of->mutex){+.+.+.}, at: [<ffffffff8131c191>] kernfs_fop_write+0x101/0x1c0 [ 30.163545] #2: (s_active#3){++++.+}, at: [<ffffffff8131c199>] kernfs_fop_write+0x109/0x1c0 [ 30.165893] #3: (&dev->mutex){......}, at: [<ffffffff815f7a3f>] unbind_store+0xff/0x160 [ 30.168124] #4: (&dev->mutex){......}, at: [<ffffffff815f931d>] device_release_driver+0x1d/0x40 [ 30.170435] #5: (&dev->mutex){......}, at: [<ffffffff815f931d>] device_release_driver+0x1d/0x40 [ 30.172755] [ 30.172755] stack backtrace: [ 30.174156] CPU: 0 PID: 863 Comm: lt-libndctl Tainted: G O 4.8.0-rc1+ #19 [ 30.175988] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 [ 30.178260] 0000000000000086 0000000099c8ecbb ffff8803394538d0 ffffffff814b4f23 [ 30.180325] ffff8803380c4a90 ffff880310b7e128 ffff880339453908 ffffffff81107653 [ 30.182389] ffff880310b7e128 0000000000000282 ffff88033b002940 ffffea000c42de00 [ 30.184456] Call Trace: [ 30.185232] [<ffffffff814b4f23>] dump_stack+0x85/0xc2 [ 30.186328] [<ffffffff81107653>] debug_check_no_locks_freed+0x153/0x160 [ 30.187616] [<ffffffffa003fb1e>] ? namespace_pmem_release+0x2e/0x40 [libnvdimm] [ 30.189391] [<ffffffff8125e039>] kfree+0xc9/0x270 [ 30.190445] [<ffffffffa003fb1e>] namespace_pmem_release+0x2e/0x40 [libnvdimm] [ 30.192197] [<ffffffff815f4042>] device_release+0x32/0x90 [ 30.193333] [<ffffffff814b7a1a>] kobject_release+0x6a/0x170 [ 30.194494] [<ffffffff814b78e7>] kobject_put+0x27/0x50 [ 30.195602] [<ffffffff815f4377>] put_device+0x17/0x20 [ 30.196696] [<ffffffff81493290>] disk_release+0xc0/0x100 [ 30.197828] [<ffffffff815f4042>] device_release+0x32/0x90 [ 30.198966] [<ffffffff814b7a1a>] kobject_release+0x6a/0x170 [ 30.200126] [<ffffffff814b78e7>] kobject_put+0x27/0x50 [ 30.201230] [<ffffffff815f4377>] put_device+0x17/0x20 [ 30.202328] [<ffffffff81214e0a>] bdi_unregister+0x20a/0x290 [ 30.203482] [<ffffffff81107289>] ? trace_hardirqs_on_caller+0x129/0x1b0 [ 30.204769] [<ffffffff8147e2ac>] blk_cleanup_queue+0x18c/0x280 [ 30.205957] [<ffffffffa00a117e>] pmem_release_queue+0xe/0x10 [nd_pmem] [ 30.207228] [<ffffffff815fd49f>] devm_action_release+0xf/0x20 [ 30.208407] [<ffffffff815fdf79>] release_nodes+0x129/0x230 [ 30.209556] [<ffffffff815fe2ec>] devres_release_all+0x3c/0x60 [ 30.210736] [<ffffffff815f9249>] __device_release_driver+0xa9/0x160 [ 30.211979] [<ffffffff815f9325>] device_release_driver+0x25/0x40 [ 30.213187] [<ffffffff815f88a4>] bus_remove_device+0x114/0x190 [ 30.214377] [<ffffffff815f4978>] device_del+0x148/0x270 [ 30.215490] [<ffffffff815f4380>] ? put_device+0x20/0x20 [ 30.216609] [<ffffffffa003f770>] ? nd_region_remove+0x80/0x80 [libnvdimm] [ 30.217914] [<ffffffff815f4aba>] device_unregister+0x1a/0x60 [ 30.219085] [<ffffffffa003b828>] nd_device_unregister+0x48/0x50 [libnvdimm] [ 30.220412] [<ffffffffa003f780>] child_unregister+0x10/0x20 [libnvdimm] [ 30.221696] [<ffffffff815f4470>] device_for_each_child+0x50/0x90 [ 30.222910] [<ffffffffa003f718>] nd_region_remove+0x28/0x80 [libnvdimm] [ 30.224195] [<ffffffffa003bbe1>] nvdimm_bus_remove+0x41/0xa0 [libnvdimm] [ 30.225488] [<ffffffff815f9241>] __device_release_driver+0xa1/0x160 [ 30.226727] [<ffffffff815f9325>] device_release_driver+0x25/0x40 [ 30.227947] [<ffffffff815f7a4f>] unbind_store+0x10f/0x160 [ 30.229081] [<ffffffff815f6fa5>] drv_attr_store+0x25/0x30 [ 30.230215] [<ffffffff8131cee5>] sysfs_kf_write+0x45/0x60 [ 30.231352] [<ffffffff8131c1c5>] kernfs_fop_write+0x135/0x1c0 [ 30.232530] [<ffffffff81288977>] __vfs_write+0x37/0x160 [ 30.233643] [<ffffffff81102f47>] ? update_fast_ctr+0x17/0x30 [ 30.234814] [<ffffffff81102fd9>] ? percpu_down_read+0x49/0x90 [ 30.235988] [<ffffffff8128c537>] ? __sb_start_write+0xb7/0xf0 [ 30.237169] [<ffffffff8128c537>] ? __sb_start_write+0xb7/0xf0 [ 30.238344] [<ffffffff812890d8>] vfs_write+0xb8/0x1b0 [ 30.239444] [<ffffffff8128a568>] SyS_write+0x58/0xc0 [ 30.240534] [<ffffffff8199837c>] entry_SYSCALL_64_fastpath+0x1f/0xbd [ 30.693773] BUG: unable to handle kernel paging request at 0000000040070088 [..] [ 30.695296] IP: [<ffffffff819886ae>] klist_put+0xe/0x90 [ 30.696501] PGD 3383e2067 PUD 0 [ 30.697532] Oops: 0000 [#1] SMP [ 30.698403] Dumping ftrace buffer: [ 30.699306] (ftrace buffer empty) [ 30.700228] Modules linked in: nd_blk(O) nfit_test(O) ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ip6table_mangle ip6table_raw ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_security iptable_mangle iptable_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_security ebtable_filter ebtables ip6table_filter ip6_tables dax_pmem(O) nd_pmem(O) nd_btt(O) dax(O) nd_e820(O) nfit(O) tpm_tis libnvdimm(O) tpm_tis_core tpm serio_raw nfit_test_iomap(O) [ 30.712449] CPU: 0 PID: 863 Comm: lt-libndctl Tainted: G O 4.8.0-rc1+ #19 [ 30.714325] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 [ 30.716650] task: ffff8803380c4240 task.stack: ffff880339450000 [ 30.717868] RIP: 0010:[<ffffffff819886ae>] [<ffffffff819886ae>] klist_put+0xe/0x90 [ 30.719774] RSP: 0018:ffff880339453b40 EFLAGS: 00010246 [ 30.720917] RAX: 0000000000000000 RBX: ffff8803380c4240 RCX: ffff8803380c4240 [ 30.722277] RDX: ffffffff819887dd RSI: 0000000000000001 RDI: 0000000040070088 [ 30.723647] RBP: ffff880339453b60 R08: ffffffff81f4eba0 R09: 0000000000000000 [ 30.725026] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000040070088 [ 30.726395] R13: ffffffffa004d3e0 R14: 0000000000000001 R15: fffffffffffffff2 [ 30.727763] FS: 00007f253e15f840(0000) GS:ffff88033fc00000(0000) knlGS:0000000000000000 [ 30.729641] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 30.730842] CR2: 0000000040070088 CR3: 00000003395d3000 CR4: 00000000000006f0 [ 30.732212] Stack: [ 30.732941] ffff8803380c4240 0000000040070088 ffffffffa004d3e0 ffff880310b79050 [ 30.735047] ffff880339453bc8 ffffffff819887ea ffff8803104ac038 ffff880324ba2848 [ 30.737156] ffffffff81f4eba0 ffffffff81f4eba0 0000000040070088 ffff8803380c4240 [ 30.739249] Call Trace: [ 30.740029] [<ffffffff819887ea>] klist_remove+0x7a/0xe0 [ 30.741166] [<ffffffff815f9291>] __device_release_driver+0xf1/0x160 [ 30.742434] [<ffffffff815f9325>] device_release_driver+0x25/0x40 [ 30.743668] [<ffffffff815f88a4>] bus_remove_device+0x114/0x190 [ 30.744892] [<ffffffff815f4978>] device_del+0x148/0x270 [ 30.746038] [<ffffffff815f4380>] ? put_device+0x20/0x20 [ 30.747185] [<ffffffffa003f770>] ? nd_region_remove+0x80/0x80 [libnvdimm] [ 30.748520] [<ffffffff815f4aba>] device_unregister+0x1a/0x60 [ 30.749715] [<ffffffffa003b828>] nd_device_unregister+0x48/0x50 [libnvdimm] [ 30.751088] [<ffffffffa003f780>] child_unregister+0x10/0x20 [libnvdimm] [ 30.752401] [<ffffffff815f4470>] device_for_each_child+0x50/0x90 [ 30.753650] [<ffffffffa003f718>] nd_region_remove+0x28/0x80 [libnvdimm] [ 30.754979] [<ffffffffa003bbe1>] nvdimm_bus_remove+0x41/0xa0 [libnvdimm] [ 30.756307] [<ffffffff815f9241>] __device_release_driver+0xa1/0x160 [ 30.757583] [<ffffffff815f9325>] device_release_driver+0x25/0x40 [ 30.758828] [<ffffffff815f7a4f>] unbind_store+0x10f/0x160 [ 30.759986] [<ffffffff815f6fa5>] drv_attr_store+0x25/0x30 [ 30.761145] [<ffffffff8131cee5>] sysfs_kf_write+0x45/0x60 [ 30.762299] [<ffffffff8131c1c5>] kernfs_fop_write+0x135/0x1c0 [ 30.763502] [<ffffffff81288977>] __vfs_write+0x37/0x160 [ 30.764641] [<ffffffff81102f47>] ? update_fast_ctr+0x17/0x30 [ 30.765845] [<ffffffff81102fd9>] ? percpu_down_read+0x49/0x90 [ 30.767055] [<ffffffff8128c537>] ? __sb_start_write+0xb7/0xf0 [ 30.768265] [<ffffffff8128c537>] ? __sb_start_write+0xb7/0xf0 [ 30.769477] [<ffffffff812890d8>] vfs_write+0xb8/0x1b0 [ 30.770592] [<ffffffff8128a568>] SyS_write+0x58/0xc0 [ 30.771697] [<ffffffff8199837c>] entry_SYSCALL_64_fastpath+0x1f/0xbd [ 30.772979] Code: f5 55 5d 00 01 e8 b3 6e 72 ff e9 63 ff ff ff e8 69 f5 00 00 e9 72 ff ff ff 0f 1f 40 00 55 48 89 e5 41 56 41 55 41 54 53 41 89 f6 <4c> 8b 27 48 89 fb 49 83 e4 fe 4c 89 e7 4d 8b 6c 24 50 e8 3b f2 [ 30.781430] RIP [<ffffffff819886ae>] klist_put+0xe/0x90 [ 30.782642] RSP <ffff880339453b40> [ 30.783550] CR2: 0000000040070088 [ 30.784495] ---[ end trace 0dcefe91828fba52 ]--- [ 30.786917] Kernel panic - not syncing: Fatal exception [ 30.788400] Dumping ftrace buffer: [ 30.789305] (ftrace buffer empty) [ 30.790225] Kernel Offset: disabled [ 30.791137] Rebooting in 5 seconds..
[ 34.255986] general protection fault: 0000 [#1] SMP [ 34.257026] Dumping ftrace buffer: [ 34.257859] (ftrace buffer empty) [ 34.258713] Modules linked in: nd_blk(O) nfit_test(O) ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables nd_pmem(O) dax_pmem(O) dax(O) nd_btt(O) serio_raw nd_e820(O) nfit(O) libnvdimm(O) tpm_tis tpm_tis_core tpm nfit_test_iomap(O) [ 34.269688] CPU: 1 PID: 871 Comm: lt-libndctl Tainted: G O 4.8.0-rc1+ #19 [ 34.271364] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.1-0-gb3ef39f-prebuilt.qemu-project.org 04/01/2014 [ 34.273450] task: ffff880338b64240 task.stack: ffff880338a80000 [ 34.274564] RIP: 0010:[<ffffffff819888ae>] [<ffffffff819888ae>] klist_next+0x5e/0x120 [ 34.276327] RSP: 0018:ffff880338a83c90 EFLAGS: 00010217 [ 34.277371] RAX: ffff880313384278 RBX: 6b6b6b6b6b6b6b63 RCX: c2482a36b45bcd4d [ 34.278614] RDX: 0000000000000001 RSI: 0000000082f7649c RDI: ffff880313384248 [ 34.279864] RBP: ffff880338a83cb8 R08: 0000000000000000 R09: 0000000000000000 [ 34.281114] R10: 0000000000000000 R11: 0000000000003436 R12: ffff880338a83cc8 [ 34.282356] R13: 0000000000000000 R14: ffff880313384a80 R15: 0000000000000000 [ 34.283593] FS: 00007fa1a7e06840(0000) GS:ffff88033fd00000(0000) knlGS:0000000000000000 [ 34.285306] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 34.286404] CR2: 0000561d5afc9208 CR3: 000000033891c000 CR4: 00000000000006e0 [ 34.287654] Stack: [ 34.288339] 0000000000000000 0000000000000000 ffffffffa002e770 ffffffffa017d100 [ 34.290258] fffffffffffffff2 ffff880338a83cf8 ffffffff815f447b ffff880313384248 [ 34.301588] 0000000000000000 0000000070757190 ffff880312c49868 ffff880312c49868 [ 34.303488] Call Trace: [ 34.304208] [<ffffffffa002e770>] ? nd_region_remove+0x80/0x80 [libnvdimm] [ 34.305428] [<ffffffff815f447b>] device_for_each_child+0x5b/0x90 [ 34.306566] [<ffffffffa002e718>] nd_region_remove+0x28/0x80 [libnvdimm] [ 34.307776] [<ffffffffa002abe1>] nvdimm_bus_remove+0x41/0xa0 [libnvdimm] [ 34.308983] [<ffffffff815f9241>] __device_release_driver+0xa1/0x160 [ 34.310148] [<ffffffff815f9325>] device_release_driver+0x25/0x40 [ 34.311273] [<ffffffff815f7a4f>] unbind_store+0x10f/0x160 [ 34.312330] [<ffffffff815f6fa5>] drv_attr_store+0x25/0x30 [ 34.313391] [<ffffffff8131cee5>] sysfs_kf_write+0x45/0x60 [ 34.314451] [<ffffffff8131c1c5>] kernfs_fop_write+0x135/0x1c0 [ 34.315557] [<ffffffff81288977>] __vfs_write+0x37/0x160 [ 34.316610] [<ffffffff81102f47>] ? update_fast_ctr+0x17/0x30 [ 34.317704] [<ffffffff81102fd9>] ? percpu_down_read+0x49/0x90 [ 34.318805] [<ffffffff8128c537>] ? __sb_start_write+0xb7/0xf0 [ 34.319906] [<ffffffff8128c537>] ? __sb_start_write+0xb7/0xf0 [ 34.321010] [<ffffffff812890d8>] vfs_write+0xb8/0x1b0 [ 34.322029] [<ffffffff8128a568>] SyS_write+0x58/0xc0 [ 34.323039] [<ffffffff8199837c>] entry_SYSCALL_64_fastpath+0x1f/0xbd [ 34.324208] Code: 8d 58 f8 f0 41 83 6e 18 01 74 60 49 8b 3c 24 45 31 ed 48 8d 47 30 45 31 ff 49 c7 44 24 08 00 00 00 00 48 39 c3 0f 84 b4 00 00 00 <f6> 03 01 75 70 b8 01 00 00 00 f0 0f c1 43 18 83 c0 01 83 f8 01 [ 34.331684] RIP [<ffffffff819888ae>] klist_next+0x5e/0x120 [ 34.332798] RSP <ffff880338a83c90> [ 34.333659] ---[ end trace db64022aaca42534 ]---
Attachment:
patch
Description: Binary data