On Tue, Nov 1, 2016 at 11:52 PM, Douglas Miller <dougmill@xxxxxxxxxxxxxxxxxx> wrote: > On 10/24/2016 01:54 PM, Sreekanth Reddy wrote: >> >> Observing below kernel panic while creating second raid disk >> on LSI SAS3008 HBA card. >> >> [ +0.000055] ------------[ cut here ]------------ >> [ +0.000007] WARNING: CPU: 2 PID: 281 at fs/sysfs/dir.c:31 >> sysfs_warn_dup+0x62/0x80 >> [ +0.000002] sysfs: cannot create duplicate filename >> '/devices/virtual/bdi/8:32' >> [ +0.000001] Modules linked in: mptctl mptbase xt_CHECKSUM iptable_mangle >> ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack tun bridge >> stp llc ebtable_filter ebtables ip6table_filter ip6_tables intel_rapl >> sb_edac edac_core x86_pkg_temp_pclmul joydev ghash_clmulni_intel iTCO_wdt >> ipmi_ssif mei_me pcspkr mei iTCO_vendor_support ipmi_si i2c_i801 lpc_ich >> mfd_corema acpi_pad wmi acpi_power_meter nfsd auth_rpcgss nfs_acl lockd >> grace binfmt_misc sunrpc xfs libcrc32c ast i2c_algo_bit drm_kore raid_class >> nvme_core scsi_transport_sas dca >> [ +0.000067] CPU: 2 PID: 281 Comm: kworker/u49:5 Not tainted 4.9.0-rc2 #1 >> [ +0.000002] Hardware name: Supermicro SYS-2028U-TNRT+/X10DRU-i+, BIOS >> 1.1 07/22/2015 >> [ +0.000005] Workqueue: events_unbound async_run_entry_fn >> [ +0.000004] Call Trace: >> [ +0.000009] [<ffffffff813ca51e>] dump_stack+0x63/0x85 >> [ +0.000005] [<ffffffff810a5bfb>] __warn+0xcb/0xf0 >> [ +0.000004] [<ffffffff810a5c7f>] warn_slowpath_fmt+0x5f/0x80 >> [ +0.000006] [<ffffffff812bf17f>] ? kernfs_path_from_node+0x4f/0x60 >> [ +0.000002] [<ffffffff812c2942>] sysfs_warn_dup+0x62/0x80 >> [ +0.000002] [<ffffffff812c2a27>] sysfs_create_dir_ns+0x77/0x90 >> [ +0.000004] [<ffffffff813ccef9>] kobject_add_internal+0x99/0x330 >> [ +0.000003] [<ffffffff813d6efb>] ? vsnprintf+0x35b/0x4c0 >> [ +0.000003] [<ffffffff813cd6f5>] kobject_add+0x75/0xd0 >> [ +0.000006] [<ffffffff81514e43>] ? device_private_init+0x23/0x70 >> [ +0.000007] [<ffffffff817cb652>] ? mutex_lock+0x12/0x30 >> [ +0.000003] [<ffffffff81514fa9>] device_add+0x119/0x670 >> [ +0.000004] [<ffffffff815156f0>] device_create_groups_vargs+0xe0/0xf0 >> [ +0.000003] [<ffffffff8151571c>] device_create_vargs+0x1c/0x20 >> [ +0.000006] [<ffffffff811d712c>] bdi_register+0x8c/0x180 >> [ +0.000003] [<ffffffff811d7506>] bdi_register_owner+0x36/0x60 >> [ +0.000006] [<ffffffff813ad778>] device_add_disk+0x168/0x480 >> [ +0.000005] [<ffffffff81524891>] ? update_autosuspend+0x51/0x60 >> [ +0.000005] [<ffffffff81557770>] sd_probe_async+0x110/0x1c0 >> [ +0.000002] [<ffffffff810c8a49>] async_run_entry_fn+0x39/0x140 >> [ +0.000003] [<ffffffff810bfa5f>] process_one_work+0x15f/0x430 >> [ +0.000002] [<ffffffff810bfd7e>] worker_thread+0x4e/0x490 >> [ +0.000002] [<ffffffff810bfd30>] ? process_one_work+0x430/0x430 >> [ +0.000003] [<ffffffff810c55a9>] kthread+0xd9/0xf0 >> [ +0.000003] [<ffffffff810c54d0>] ? kthread_park+0x60/0x60 >> [ +0.000003] [<ffffffff817ce595>] ret_from_fork+0x25/0x30 >> [ +0.000002] ------------[ cut here ]------------ >> [ +0.000004] WARNING: CPU: 2 PID: 281 at lib/kobject.c:240 >> kobject_add_internal+0x2bd/0x330 >> [ +0.000001] kobject_add_internal failed for 8:32 with -EEXIST, don't try >> to register things with the same name in the same >> [ +0.000001] Modules linked in: mptctl mptbase xt_CHECKSUM iptable_mangle >> ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack tun bridge >> stp llc ebtable_filter ebtables ip6table_filter ip6_tables intel_rapl >> sb_edac edac_core x86_pkg_temp_pclmul joydev ghash_clmulni_intel iTCO_wdt >> ipmi_ssif mei_me pcspkr mei iTCO_vendor_support ipmi_si i2c_i801 lpc_ich >> mfd_corema acpi_pad wmi acpi_power_meter nfsd auth_rpcgss nfs_acl lockd >> grace binfmt_misc sunrpc xfs libcrc32c ast i2c_algo_bit drm_kore raid_class >> nvme_core scsi_transport_sas dca >> [ +0.000043] CPU: 2 PID: 281 Comm: kworker/u49:5 Tainted: G W >> 4.9.0-rc2 #1 >> [ +0.000001] Hardware name: Supermicro SYS-2028U-TNRT+/X10DRU-i+, BIOS >> 1.1 07/22/2015 >> [ +0.000002] Workqueue: events_unbound async_run_entry_fn >> [ +0.000003] Call Trace: >> [ +0.000003] [<ffffffff813ca51e>] dump_stack+0x63/0x85 >> [ +0.000003] [<ffffffff810a5bfb>] __warn+0xcb/0xf0 >> [ +0.000004] [<ffffffff810a5c7f>] warn_slowpath_fmt+0x5f/0x80 >> [ +0.000002] [<ffffffff812c294a>] ? sysfs_warn_dup+0x6a/0x80 >> [ +0.000003] [<ffffffff813cd11d>] kobject_add_internal+0x2bd/0x330 >> [ +0.000003] [<ffffffff813d6efb>] ? vsnprintf+0x35b/0x4c0 >> [ +0.000003] [<ffffffff813cd6f5>] kobject_add+0x75/0xd0 >> [ +0.000003] [<ffffffff81514e43>] ? device_private_init+0x23/0x70 >> [ +0.000004] [<ffffffff817cb652>] ? mutex_lock+0x12/0x30 >> [ +0.000002] [<ffffffff81514fa9>] device_add+0x119/0x670 >> [ +0.000004] [<ffffffff815156f0>] device_create_groups_vargs+0xe0/0xf0 >> [ +0.000003] [<ffffffff8151571c>] device_create_vargs+0x1c/0x20 >> [ +0.000003] [<ffffffff811d712c>] bdi_register+0x8c/0x180 >> [ +0.000003] [<ffffffff811d7506>] bdi_register_owner+0x36/0x60 >> [ +0.000004] [<ffffffff813ad778>] device_add_disk+0x168/0x480 >> [ +0.000003] [<ffffffff81524891>] ? update_autosuspend+0x51/0x60 >> [ +0.000002] [<ffffffff81557770>] sd_probe_async+0x110/0x1c0 >> [ +0.000002] [<ffffffff810c8a49>] async_run_entry_fn+0x39/0x140 >> [ +0.000002] [<ffffffff810bfa5f>] process_one_work+0x15f/0x430 >> [ +0.000002] [<ffffffff810bfd7e>] worker_thread+0x4e/0x490 >> [ +0.000002] [<ffffffff810bfd30>] ? process_one_work+0x430/0x430 >> [ +0.000003] [<ffffffff810c55a9>] kthread+0xd9/0xf0 >> [ +0.000003] [<ffffffff810c54d0>] ? kthread_park+0x60/0x60 >> [ +0.000003] [<ffffffff817ce595>] ret_from_fork+0x25/0x30 >> [ +0.000949] BUG: unable to handle kernel >> [ +0.005263] NULL pointer dereference >> [ +0.002853] IP: [<ffffffff812c2c64>] >> sysfs_do_create_link_sd.isra.2+0x34/0xb0 >> [ +0.008584] PGD 0 >> >> [ +0.006115] Oops: 0000 [#1] SMP >> [ +0.004531] Modules linked in: mptctl mptbase xt_CHECKSUM iptable_mangle >> ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack tun bridge >> stp llc ebtable_filter ebtables ip6table_filter ip6_tables intel_rapl >> sb_edac edac_core x86_pkg_temp_pclmul joydev ghash_clmulni_intel iTCO_wdt >> ipmi_ssif mei_me pcspkr mei iTCO_vendor_support ipmi_si i2c_i801 lpc_ich >> mfd_corema acpi_pad wmi acpi_power_meter nfsd auth_rpcgss nfs_acl lockd >> grace binfmt_misc sunrpc xfs libcrc32c ast i2c_algo_bit drm_kore raid_class >> nvme_core scsi_transport_sas dca >> [ +0.080566] CPU: 17 PID: 281 Comm: kworker/u49:5 Tainted: G W >> 4.9.0-rc2 #1 >> [ +0.009472] Hardware name: Supermicro SYS-2028U-TNRT+/X10DRU-i+, BIOS >> 1.1 07/22/2015 >> [ +0.009169] Workqueue: events_unbound async_run_entry_fn >> [ +0.007340] RIP: 0010:[<ffffffff812c2c64>] [<ffffffff812c2c64>] >> sysfs_do_create_link_sd.isra.2+0x34/0xb0 >> [ +0.010294] Call Trace: >> [ +0.005269] [<ffffffff812c2d05>] sysfs_create_link+0x25/0x40 >> [ +0.008568] [<ffffffff813ad80c>] device_add_disk+0x1fc/0x480 >> [ +0.008551] [<ffffffff81557770>] sd_probe_async+0x110/0x1c0 >> [ +0.008456] [<ffffffff810c8a49>] async_run_entry_fn+0x39/0x140 >> [ +0.010021] [<ffffffff810bfa5f>] process_one_work+0x15f/0x430 >> [ +0.009623] [<ffffffff810bfd7e>] worker_thread+0x4e/0x490 >> [ +0.007422] [<ffffffff810bfd30>] ? process_one_work+0x430/0x430 >> [ +0.008728] [<ffffffff810c55a9>] kthread+0xd9/0xf0 >> [ +0.007578] [<ffffffff810c54d0>] ? kthread_park+0x60/0x60 >> [ +0.006816] [<ffffffff817ce595>] ret_from_fork+0x25/0x30 >> [ +0.006814] Code: 75 48 85 ff 74 70 55 48 89 e5 41 57 41 56 41 55 41 54 >> 49 89 fe 53 48 c7 c7 90 74 01 82 48 89 f3 41 89 cc c5 ff ff c6 05 15 48 d5 >> [ +0.022853] RIP [<ffffffff812c2c64>] >> sysfs_do_create_link_sd.isra.2+0x34/0xb0 >> [ +0.008679] RSP <ffffc90019c3fd10> >> [ +0.006129] BUG: unable to handle kernel >> >> While analyzing this issue, I observed that while creating the first raid >> disk, >> we hide first raid disk's PD devices (i.e. device will be their but it >> won't have >> block device entry). But kernel is not removing the entries of this first >> raid disk's >> PD devices BDI's in /sys/devices/virtual/bdi/ path, still it shows bdi >> device entries >> for these PD eventhough these PD doesn't have a block device names. >> >> e.g. >> output of 'ls -l /dev/sd*' after creating first raid disk >> [root@dhcp ~]# ls -l /dev/sd* >> brw-rw---- 1 root disk 8, 0 Oct 24 17:37 /dev/sda >> brw-rw---- 1 root disk 8, 1 Oct 24 17:37 /dev/sda1 >> brw-rw---- 1 root disk 8, 2 Oct 24 17:37 /dev/sda2 >> brw-rw---- 1 root disk 8, 3 Oct 24 17:37 /dev/sda3 >> brw-rw---- 1 root disk 8, 16 Oct 24 17:37 /dev/sdb >> brw-rw---- 1 root disk 8, 64 Oct 24 17:37 /dev/sde >> brw-rw---- 1 root disk 8, 80 Oct 24 17:37 /dev/sdf >> brw-rw---- 1 root disk 8, 96 Oct 24 17:37 /dev/sdg >> brw-rw---- 1 root disk 8, 112 Oct 24 17:37 /dev/sdh >> brw-rw---- 1 root disk 8, 128 Oct 24 17:37 /dev/sdi >> brw-rw---- 1 root disk 8, 144 Oct 24 17:37 /dev/sdj >> brw-rw---- 1 root disk 8, 160 Oct 24 17:41 /dev/sdk >> >> outout of 'ls -l /sys/devices/virtual/bdi/' >> [root@dhcp-135-24-192-127 ~]# ls -l /sys/devices/virtual/bdi/ >> total 0 >> drwxr-xr-x 3 root root 0 Oct 24 17:39 259:0 >> drwxr-xr-x 3 root root 0 Oct 24 17:39 8:0 >> drwxr-xr-x 3 root root 0 Oct 24 17:39 8:112 >> drwxr-xr-x 3 root root 0 Oct 24 17:39 8:128 >> drwxr-xr-x 3 root root 0 Oct 24 17:39 8:144 >> drwxr-xr-x 3 root root 0 Oct 24 17:39 8:16 >> drwxr-xr-x 3 root root 0 Oct 24 17:41 8:160 >> drwxr-xr-x 3 root root 0 Oct 24 17:39 8:32 >> drwxr-xr-x 3 root root 0 Oct 24 17:39 8:48 >> drwxr-xr-x 3 root root 0 Oct 24 17:39 8:64 >> drwxr-xr-x 3 root root 0 Oct 24 17:39 8:80 >> drwxr-xr-x 3 root root 0 Oct 24 17:39 8:96 >> >> Here we can observe that there are no block devices for >> '8:32' & '8:48' bdi entries, which are PD's for raid disk /dev/sdk. >> >> Now while creating a second raid disk, kernel is trying to use >> MAJOR:MINOR as 8:32 for second raid disk and we observe >> above kernel OOPs. >> >> By calling bdi_unregister() in del_gendisk() function has resolved this >> issue. >> >> Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@xxxxxxxxxxxx> >> --- >> block/genhd.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/block/genhd.c b/block/genhd.c >> index fcd6d4f..b95f2fa 100644 >> --- a/block/genhd.c >> +++ b/block/genhd.c >> @@ -658,6 +658,7 @@ void del_gendisk(struct gendisk *disk) >> disk->flags &= ~GENHD_FL_UP; >> >> sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi"); >> + bdi_unregister(&disk->queue->backing_dev_info); >> blk_unregister_queue(disk); >> blk_unregister_region(disk_devt(disk), disk->minors); >> > There is a problem with this patch. bdi_unregister() is also called by > blk_cleanup_queue(), and both that and del_gendisk() may be called by > cleanup_mapped_device(). This results in a panic when bdi_unregister() is > called for the second time. To fix this problem, I have already posted version 2 patch, here is the patch URL, https://patchwork.kernel.org/patch/9394471/ Please check this patch and let me known if any changes is needed. Thanks, Sreekanth > -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html