Re: [PATCH] block: Fix kernel panic occurs while creating second raid disk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 1, 2016 at 11:52 PM, Douglas Miller
<dougmill@xxxxxxxxxxxxxxxxxx> wrote:
> On 10/24/2016 01:54 PM, Sreekanth Reddy wrote:
>>
>> Observing below kernel panic while creating second raid disk
>> on LSI SAS3008 HBA card.
>>
>> [  +0.000055] ------------[ cut here ]------------
>> [  +0.000007] WARNING: CPU: 2 PID: 281 at fs/sysfs/dir.c:31
>> sysfs_warn_dup+0x62/0x80
>> [  +0.000002] sysfs: cannot create duplicate filename
>> '/devices/virtual/bdi/8:32'
>> [  +0.000001] Modules linked in: mptctl mptbase xt_CHECKSUM iptable_mangle
>> ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack tun bridge
>> stp llc ebtable_filter ebtables ip6table_filter ip6_tables intel_rapl
>> sb_edac edac_core x86_pkg_temp_pclmul joydev ghash_clmulni_intel iTCO_wdt
>> ipmi_ssif mei_me pcspkr mei iTCO_vendor_support ipmi_si i2c_i801 lpc_ich
>> mfd_corema acpi_pad wmi acpi_power_meter nfsd auth_rpcgss nfs_acl lockd
>> grace binfmt_misc sunrpc xfs libcrc32c ast i2c_algo_bit drm_kore raid_class
>> nvme_core scsi_transport_sas dca
>> [  +0.000067] CPU: 2 PID: 281 Comm: kworker/u49:5 Not tainted 4.9.0-rc2 #1
>> [  +0.000002] Hardware name: Supermicro SYS-2028U-TNRT+/X10DRU-i+, BIOS
>> 1.1 07/22/2015
>> [  +0.000005] Workqueue: events_unbound async_run_entry_fn
>> [  +0.000004] Call Trace:
>> [  +0.000009]  [<ffffffff813ca51e>] dump_stack+0x63/0x85
>> [  +0.000005]  [<ffffffff810a5bfb>] __warn+0xcb/0xf0
>> [  +0.000004]  [<ffffffff810a5c7f>] warn_slowpath_fmt+0x5f/0x80
>> [  +0.000006]  [<ffffffff812bf17f>] ? kernfs_path_from_node+0x4f/0x60
>> [  +0.000002]  [<ffffffff812c2942>] sysfs_warn_dup+0x62/0x80
>> [  +0.000002]  [<ffffffff812c2a27>] sysfs_create_dir_ns+0x77/0x90
>> [  +0.000004]  [<ffffffff813ccef9>] kobject_add_internal+0x99/0x330
>> [  +0.000003]  [<ffffffff813d6efb>] ? vsnprintf+0x35b/0x4c0
>> [  +0.000003]  [<ffffffff813cd6f5>] kobject_add+0x75/0xd0
>> [  +0.000006]  [<ffffffff81514e43>] ? device_private_init+0x23/0x70
>> [  +0.000007]  [<ffffffff817cb652>] ? mutex_lock+0x12/0x30
>> [  +0.000003]  [<ffffffff81514fa9>] device_add+0x119/0x670
>> [  +0.000004]  [<ffffffff815156f0>] device_create_groups_vargs+0xe0/0xf0
>> [  +0.000003]  [<ffffffff8151571c>] device_create_vargs+0x1c/0x20
>> [  +0.000006]  [<ffffffff811d712c>] bdi_register+0x8c/0x180
>> [  +0.000003]  [<ffffffff811d7506>] bdi_register_owner+0x36/0x60
>> [  +0.000006]  [<ffffffff813ad778>] device_add_disk+0x168/0x480
>> [  +0.000005]  [<ffffffff81524891>] ? update_autosuspend+0x51/0x60
>> [  +0.000005]  [<ffffffff81557770>] sd_probe_async+0x110/0x1c0
>> [  +0.000002]  [<ffffffff810c8a49>] async_run_entry_fn+0x39/0x140
>> [  +0.000003]  [<ffffffff810bfa5f>] process_one_work+0x15f/0x430
>> [  +0.000002]  [<ffffffff810bfd7e>] worker_thread+0x4e/0x490
>> [  +0.000002]  [<ffffffff810bfd30>] ? process_one_work+0x430/0x430
>> [  +0.000003]  [<ffffffff810c55a9>] kthread+0xd9/0xf0
>> [  +0.000003]  [<ffffffff810c54d0>] ? kthread_park+0x60/0x60
>> [  +0.000003]  [<ffffffff817ce595>] ret_from_fork+0x25/0x30
>> [  +0.000002] ------------[ cut here ]------------
>> [  +0.000004] WARNING: CPU: 2 PID: 281 at lib/kobject.c:240
>> kobject_add_internal+0x2bd/0x330
>> [  +0.000001] kobject_add_internal failed for 8:32 with -EEXIST, don't try
>> to register things with the same name in the same
>> [  +0.000001] Modules linked in: mptctl mptbase xt_CHECKSUM iptable_mangle
>> ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack tun bridge
>> stp llc ebtable_filter ebtables ip6table_filter ip6_tables intel_rapl
>> sb_edac edac_core x86_pkg_temp_pclmul joydev ghash_clmulni_intel iTCO_wdt
>> ipmi_ssif mei_me pcspkr mei iTCO_vendor_support ipmi_si i2c_i801 lpc_ich
>> mfd_corema acpi_pad wmi acpi_power_meter nfsd auth_rpcgss nfs_acl lockd
>> grace binfmt_misc sunrpc xfs libcrc32c ast i2c_algo_bit drm_kore raid_class
>> nvme_core scsi_transport_sas dca
>> [  +0.000043] CPU: 2 PID: 281 Comm: kworker/u49:5 Tainted: G        W
>> 4.9.0-rc2 #1
>> [  +0.000001] Hardware name: Supermicro SYS-2028U-TNRT+/X10DRU-i+, BIOS
>> 1.1 07/22/2015
>> [  +0.000002] Workqueue: events_unbound async_run_entry_fn
>> [  +0.000003] Call Trace:
>> [  +0.000003]  [<ffffffff813ca51e>] dump_stack+0x63/0x85
>> [  +0.000003]  [<ffffffff810a5bfb>] __warn+0xcb/0xf0
>> [  +0.000004]  [<ffffffff810a5c7f>] warn_slowpath_fmt+0x5f/0x80
>> [  +0.000002]  [<ffffffff812c294a>] ? sysfs_warn_dup+0x6a/0x80
>> [  +0.000003]  [<ffffffff813cd11d>] kobject_add_internal+0x2bd/0x330
>> [  +0.000003]  [<ffffffff813d6efb>] ? vsnprintf+0x35b/0x4c0
>> [  +0.000003]  [<ffffffff813cd6f5>] kobject_add+0x75/0xd0
>> [  +0.000003]  [<ffffffff81514e43>] ? device_private_init+0x23/0x70
>> [  +0.000004]  [<ffffffff817cb652>] ? mutex_lock+0x12/0x30
>> [  +0.000002]  [<ffffffff81514fa9>] device_add+0x119/0x670
>> [  +0.000004]  [<ffffffff815156f0>] device_create_groups_vargs+0xe0/0xf0
>> [  +0.000003]  [<ffffffff8151571c>] device_create_vargs+0x1c/0x20
>> [  +0.000003]  [<ffffffff811d712c>] bdi_register+0x8c/0x180
>> [  +0.000003]  [<ffffffff811d7506>] bdi_register_owner+0x36/0x60
>> [  +0.000004]  [<ffffffff813ad778>] device_add_disk+0x168/0x480
>> [  +0.000003]  [<ffffffff81524891>] ? update_autosuspend+0x51/0x60
>> [  +0.000002]  [<ffffffff81557770>] sd_probe_async+0x110/0x1c0
>> [  +0.000002]  [<ffffffff810c8a49>] async_run_entry_fn+0x39/0x140
>> [  +0.000002]  [<ffffffff810bfa5f>] process_one_work+0x15f/0x430
>> [  +0.000002]  [<ffffffff810bfd7e>] worker_thread+0x4e/0x490
>> [  +0.000002]  [<ffffffff810bfd30>] ? process_one_work+0x430/0x430
>> [  +0.000003]  [<ffffffff810c55a9>] kthread+0xd9/0xf0
>> [  +0.000003]  [<ffffffff810c54d0>] ? kthread_park+0x60/0x60
>> [  +0.000003]  [<ffffffff817ce595>] ret_from_fork+0x25/0x30
>> [  +0.000949] BUG: unable to handle kernel
>> [  +0.005263] NULL pointer dereference
>> [  +0.002853] IP: [<ffffffff812c2c64>]
>> sysfs_do_create_link_sd.isra.2+0x34/0xb0
>> [  +0.008584] PGD 0
>>
>> [  +0.006115] Oops: 0000 [#1] SMP
>> [  +0.004531] Modules linked in: mptctl mptbase xt_CHECKSUM iptable_mangle
>> ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack tun bridge
>> stp llc ebtable_filter ebtables ip6table_filter ip6_tables intel_rapl
>> sb_edac edac_core x86_pkg_temp_pclmul joydev ghash_clmulni_intel iTCO_wdt
>> ipmi_ssif mei_me pcspkr mei iTCO_vendor_support ipmi_si i2c_i801 lpc_ich
>> mfd_corema acpi_pad wmi acpi_power_meter nfsd auth_rpcgss nfs_acl lockd
>> grace binfmt_misc sunrpc xfs libcrc32c ast i2c_algo_bit drm_kore raid_class
>> nvme_core scsi_transport_sas dca
>> [  +0.080566] CPU: 17 PID: 281 Comm: kworker/u49:5 Tainted: G        W
>> 4.9.0-rc2 #1
>> [  +0.009472] Hardware name: Supermicro SYS-2028U-TNRT+/X10DRU-i+, BIOS
>> 1.1 07/22/2015
>> [  +0.009169] Workqueue: events_unbound async_run_entry_fn
>> [  +0.007340] RIP: 0010:[<ffffffff812c2c64>] [<ffffffff812c2c64>]
>> sysfs_do_create_link_sd.isra.2+0x34/0xb0
>> [  +0.010294] Call Trace:
>> [  +0.005269]  [<ffffffff812c2d05>] sysfs_create_link+0x25/0x40
>> [  +0.008568]  [<ffffffff813ad80c>] device_add_disk+0x1fc/0x480
>> [  +0.008551]  [<ffffffff81557770>] sd_probe_async+0x110/0x1c0
>> [  +0.008456]  [<ffffffff810c8a49>] async_run_entry_fn+0x39/0x140
>> [  +0.010021]  [<ffffffff810bfa5f>] process_one_work+0x15f/0x430
>> [  +0.009623]  [<ffffffff810bfd7e>] worker_thread+0x4e/0x490
>> [  +0.007422]  [<ffffffff810bfd30>] ? process_one_work+0x430/0x430
>> [  +0.008728]  [<ffffffff810c55a9>] kthread+0xd9/0xf0
>> [  +0.007578]  [<ffffffff810c54d0>] ? kthread_park+0x60/0x60
>> [  +0.006816]  [<ffffffff817ce595>] ret_from_fork+0x25/0x30
>> [  +0.006814] Code: 75 48 85 ff 74 70 55 48 89 e5 41 57 41 56 41 55 41 54
>> 49 89 fe 53 48 c7 c7 90 74 01 82 48 89 f3 41 89 cc  c5 ff ff c6 05 15 48 d5
>> [  +0.022853] RIP  [<ffffffff812c2c64>]
>> sysfs_do_create_link_sd.isra.2+0x34/0xb0
>> [  +0.008679]  RSP <ffffc90019c3fd10>
>> [  +0.006129] BUG: unable to handle kernel
>>
>> While analyzing this issue, I observed that while creating the first raid
>> disk,
>> we hide first raid disk's PD devices (i.e. device will be their but it
>> won't have
>> block device entry). But kernel is not removing the entries of this first
>> raid disk's
>>  PD devices BDI's in /sys/devices/virtual/bdi/ path, still it shows bdi
>> device entries
>> for these PD eventhough these PD doesn't have a block device names.
>>
>> e.g.
>> output of 'ls -l /dev/sd*' after creating first raid disk
>> [root@dhcp ~]# ls -l /dev/sd*
>> brw-rw---- 1 root disk 8,   0 Oct 24 17:37 /dev/sda
>> brw-rw---- 1 root disk 8,   1 Oct 24 17:37 /dev/sda1
>> brw-rw---- 1 root disk 8,   2 Oct 24 17:37 /dev/sda2
>> brw-rw---- 1 root disk 8,   3 Oct 24 17:37 /dev/sda3
>> brw-rw---- 1 root disk 8,  16 Oct 24 17:37 /dev/sdb
>> brw-rw---- 1 root disk 8,  64 Oct 24 17:37 /dev/sde
>> brw-rw---- 1 root disk 8,  80 Oct 24 17:37 /dev/sdf
>> brw-rw---- 1 root disk 8,  96 Oct 24 17:37 /dev/sdg
>> brw-rw---- 1 root disk 8, 112 Oct 24 17:37 /dev/sdh
>> brw-rw---- 1 root disk 8, 128 Oct 24 17:37 /dev/sdi
>> brw-rw---- 1 root disk 8, 144 Oct 24 17:37 /dev/sdj
>> brw-rw---- 1 root disk 8, 160 Oct 24 17:41 /dev/sdk
>>
>> outout of 'ls -l /sys/devices/virtual/bdi/'
>> [root@dhcp-135-24-192-127 ~]# ls -l /sys/devices/virtual/bdi/
>> total 0
>> drwxr-xr-x 3 root root 0 Oct 24 17:39 259:0
>> drwxr-xr-x 3 root root 0 Oct 24 17:39 8:0
>> drwxr-xr-x 3 root root 0 Oct 24 17:39 8:112
>> drwxr-xr-x 3 root root 0 Oct 24 17:39 8:128
>> drwxr-xr-x 3 root root 0 Oct 24 17:39 8:144
>> drwxr-xr-x 3 root root 0 Oct 24 17:39 8:16
>> drwxr-xr-x 3 root root 0 Oct 24 17:41 8:160
>> drwxr-xr-x 3 root root 0 Oct 24 17:39 8:32
>> drwxr-xr-x 3 root root 0 Oct 24 17:39 8:48
>> drwxr-xr-x 3 root root 0 Oct 24 17:39 8:64
>> drwxr-xr-x 3 root root 0 Oct 24 17:39 8:80
>> drwxr-xr-x 3 root root 0 Oct 24 17:39 8:96
>>
>> Here we can observe that there are no block devices for
>> '8:32' & '8:48' bdi entries, which are PD's for raid disk /dev/sdk.
>>
>> Now while creating a second raid disk, kernel is trying to use
>> MAJOR:MINOR as 8:32 for second raid disk and we observe
>> above kernel OOPs.
>>
>> By calling bdi_unregister() in del_gendisk() function has resolved this
>> issue.
>>
>> Signed-off-by: Sreekanth Reddy <Sreekanth.Reddy@xxxxxxxxxxxx>
>> ---
>>  block/genhd.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/block/genhd.c b/block/genhd.c
>> index fcd6d4f..b95f2fa 100644
>> --- a/block/genhd.c
>> +++ b/block/genhd.c
>> @@ -658,6 +658,7 @@ void del_gendisk(struct gendisk *disk)
>>      disk->flags &= ~GENHD_FL_UP;
>>
>>      sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
>> +    bdi_unregister(&disk->queue->backing_dev_info);
>>      blk_unregister_queue(disk);
>>      blk_unregister_region(disk_devt(disk), disk->minors);
>>
> There is a problem with this patch. bdi_unregister() is also called by
> blk_cleanup_queue(), and both that and del_gendisk() may be called by
> cleanup_mapped_device(). This results in a panic when bdi_unregister() is
> called for the second time.

To fix this problem, I have already posted version 2 patch, here is
the patch URL,
https://patchwork.kernel.org/patch/9394471/

Please check this patch and let me known if any changes is needed.

Thanks,
Sreekanth
>
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux