Re: Fwd: Kernel crash trying to add an already removed disk from RAID 10

Guoqing Jiang <gqjiang@xxxxxxxx> · Wed, 25 Apr 2018 14:52:18 +0800

On 04/25/2018 12:58 AM, Alexis Castilla wrote:
Hi, community!
When trying to re-add an already removed disk from a RAID 10, the
kernel crashes.
Running on x86_64 ( Intel Xeon ) and Archlinux.

I've created a script to reproduce this issue.

  #!/bin/sh
  mdadm --zero-superblock /dev/sdp4
  mdadm --zero-superblock /dev/sdo4
  mdadm --zero-superblock /dev/sdn4
  mdadm --create -vvv --force --run --metadata=1.2 /dev/md0 --level=10
--chunk=128 --layout=f3 --raid-devices=3 /dev/sdp4 /dev/sdo4 /dev/sdn4
  sleep 5
  mdadm --fail -vvv /dev/md0 sdn4
  sleep 5
  mdadm --remove -vvv /dev/md0 sdn4
  sleep 5
  mdadm --add /dev/md0 /dev/sdn4


That is enough to trigger this issue.
Should be related to RAID layout configuration. I cannot reproduce it
with f2 or n2 ( but it triggers with f3 or l3 ).
Tested with mdadm 4.0.
Also tested on kernel 4.14.18 , 4.14.35 and 4.16.3 . Same issue on all of them.
Has someone found something similar?
I'm not a kernel expert but I will try to help as far as I can.
Thanks.

[   55.951379] md/raid10:md0: Disk failure on sdn4, disabling device.
[   55.951379] md/raid10:md0: Operation continuing on 2 devices.
[   88.257917] md/raid10:md0: Disk failure on sdn4, disabling device.
[   88.257917] md/raid10:md0: Operation continuing on 2 devices.
[   98.347009] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000050
[   98.355783] IP: r10buf_pool_free+0x38/0xe0 [raid10]
[   98.361239] PGD 0 P4D 0
[   98.364076] Oops: 0000 [#1] SMP PTI
[   98.367979] Modules linked in: raid10 md_mod mlx4_ib mlx4_en
ib_core ipmi_ssif intel_rapl x86_pkg_temp_thermal intel_powerclamp
coretemp mgag200 kvm_intel i2c_algo_bit ttm kvm drm_kms_helper drm
irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc
aesni_intel aes_x86_64 crypto_simd nls_iso8859_1 glue_helper nls_cp437
cryptd agpgart input_leds joydev iTCO_wdt vfat syscopyarea
intel_cstate evdev mousedev iTCO_vendor_support led_class sysfillrect
fat sysimgblt intel_rapl_perf pcspkr mac_hid i2c_i801 lpc_ich
mlx4_core fb_sys_fops e1000e ixgbe mei_me mei devlink mdio ptp
pps_core dca shpchp ipmi_si ipmi_devintf wmi ipmi_msghandler button
sch_fq_codel ip_tables x_tables xfs libcrc32c crc32c_generic sr_mod
ses cdrom enclosure sd_mod hid_generic usbhid hid uas usb_storage isci
ahci libsas libahci
[   98.447354]  ehci_pci ehci_hcd crc32c_intel mpt3sas raid_class
libata scsi_transport_sas usbcore usb_common scsi_mod
[   98.459136] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.35-1-lts #1
[   98.466439] Hardware name: HDS Hitachi Flash Storage/Hitachi Flash
Storage, BIOS 19.21 1/22/2016
[   98.476269] task: ffffffff90012480 task.stack: ffffffff90000000
[   98.482893] RIP: 0010:r10buf_pool_free+0x38/0xe0 [raid10]
[   98.488933] RSP: 0018:ffff8f6aff003db8 EFLAGS: 00010206
[   98.494779] RAX: 0000000000000060 RBX: ffff8f6ae673be00 RCX: ffff8f6aff003e28
[   98.502754] RDX: 0000000000000000 RSI: ffff8f6ae2b53c80 RDI: ffff8f6ae2b53c80
[   98.510727] RBP: ffff8f6ae2b53ce0 R08: ffff8f6aff003e2c R09: 0000000000000001
[   98.518711] R10: 0000000000000080 R11: 0000000000000000 R12: 0000000000000000
[   98.526695] R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000
[   98.534679] FS:  0000000000000000(0000) GS:ffff8f6aff000000(0000)
knlGS:0000000000000000
[   98.543734] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   98.550161] CR2: 0000000000000050 CR3: 000000089500a001 CR4: 00000000001606f0
[   98.558145] Call Trace:
[   98.560881]  <IRQ>
[   98.563136]  put_buf+0x19/0x20 [raid10]
[   98.567426]  end_sync_request+0x6b/0x70 [raid10]
[   98.572591]  end_sync_write+0x9b/0x160 [raid10]
[   98.577662]  blk_update_request+0x78/0x2c0
[   98.582254]  scsi_end_request+0x2c/0x1e0 [scsi_mod]
[   98.587719]  scsi_io_completion+0x22f/0x610 [scsi_mod]
[   98.593472]  blk_done_softirq+0x8e/0xc0
[   98.597767]  __do_softirq+0xde/0x2b3
[   98.601770]  irq_exit+0xae/0xb0
[   98.605285]  do_IRQ+0x81/0xd0
[   98.608606]  common_interrupt+0x7d/0x7d
[   98.612898]  </IRQ>
[   98.615252] RIP: 0010:cpuidle_enter_state+0xa2/0x2e0
[   98.620806] RSP: 0018:ffffffff90003e90 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffff7c
[   98.630017] RAX: ffff8f6aff01f3c0 RBX: 00000016e5f01e98 RCX: 000000000000001f
[   98.638748] RDX: 00000016e5f01e98 RSI: fffff8ab2f1ab602 RDI: 0000000000000000
[   98.647482] RBP: ffff8f6aff028a70 R08: 00000000ffffffff R09: 000000000000000a
[   98.656209] R10: ffffffff90003e70 R11: 000000000000000f R12: 0000000000000001
[   98.664925] R13: ffffffff900ac098 R14: 0000000000000000 R15: 00000016e5efe111
[   98.673642]  do_idle+0x179/0x1d0
[   98.677988]  cpu_startup_entry+0x6f/0x80
[   98.683100]  start_kernel+0x4ae/0x4ce
[   98.687910]  secondary_startup_64+0xa5/0xb0
[   98.693298] Code: 45 31 e4 55 53 48 83 ec 08 48 63 46 78 48 89 3c
24 44 8d 68 ff 41 83 fd ff 74 6f 48 8b 34 24 48 c1 e0 05 48 8d 2c 06
4c 8b 75 28 <4d> 8b 66 50 4d 8d 7c 24 08 49 8d 9c 24 88 00 00 00 49 8b
3f 48
[   98.715888] RIP: r10buf_pool_free+0x38/0xe0 [raid10] RSP: ffff8f6aff003db8
[   98.724296] CR2: 0000000000000050
[   98.728716] ---[ end trace 26fbc93e654360aa ]---
[   98.738337] Kernel panic - not syncing: Fatal exception in interrupt
[   98.746173] Kernel Offset: 0xe000000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[   98.762456] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

Looks like it caused by the wrong bio num, r10buf_pool_alloc() allocates 
2 bios
for recovery, but r10buf_pool_free() thinks there are j bios 
(j=conf->copies).

Please try below to see it works or not:

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 3c60774c8430..840360a29de0 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -249,10 +249,15 @@ static void r10buf_pool_free(void *__r10_bio, void 
*data)
 {
        struct r10conf *conf = data;
        struct r10bio *r10bio = __r10_bio;
-       int j;
+       int j, nalloc;
        struct resync_pages *rp = NULL;

-       for (j = conf->copies; j--; ) {
+       if (test_bit(MD_RECOVERY_SYNC, &conf->mddev->recovery) ||
+           test_bit(MD_RECOVERY_RESHAPE, &conf->mddev->recovery))
+               nalloc = conf->copies; /* resync */
+       else
+               nalloc = 2; /* recovery */
+       for (j = nalloc; j--; ) {
                struct bio *bio = r10bio->devs[j].bio;

                rp = get_resync_pages(bio);

Thanks,
Guoqing

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html