On 04/25/2018 12:58 AM, Alexis Castilla wrote:
Hi, community!
When trying to re-add an already removed disk from a RAID 10, the
kernel crashes.
Running on x86_64 ( Intel Xeon ) and Archlinux.
I've created a script to reproduce this issue.
#!/bin/sh
mdadm --zero-superblock /dev/sdp4
mdadm --zero-superblock /dev/sdo4
mdadm --zero-superblock /dev/sdn4
mdadm --create -vvv --force --run --metadata=1.2 /dev/md0 --level=10
--chunk=128 --layout=f3 --raid-devices=3 /dev/sdp4 /dev/sdo4 /dev/sdn4
sleep 5
mdadm --fail -vvv /dev/md0 sdn4
sleep 5
mdadm --remove -vvv /dev/md0 sdn4
sleep 5
mdadm --add /dev/md0 /dev/sdn4
That is enough to trigger this issue.
Should be related to RAID layout configuration. I cannot reproduce it
with f2 or n2 ( but it triggers with f3 or l3 ).
Tested with mdadm 4.0.
Also tested on kernel 4.14.18 , 4.14.35 and 4.16.3 . Same issue on all of them.
Has someone found something similar?
I'm not a kernel expert but I will try to help as far as I can.
Thanks.
[ 55.951379] md/raid10:md0: Disk failure on sdn4, disabling device.
[ 55.951379] md/raid10:md0: Operation continuing on 2 devices.
[ 88.257917] md/raid10:md0: Disk failure on sdn4, disabling device.
[ 88.257917] md/raid10:md0: Operation continuing on 2 devices.
[ 98.347009] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000050
[ 98.355783] IP: r10buf_pool_free+0x38/0xe0 [raid10]
[ 98.361239] PGD 0 P4D 0
[ 98.364076] Oops: 0000 [#1] SMP PTI
[ 98.367979] Modules linked in: raid10 md_mod mlx4_ib mlx4_en
ib_core ipmi_ssif intel_rapl x86_pkg_temp_thermal intel_powerclamp
coretemp mgag200 kvm_intel i2c_algo_bit ttm kvm drm_kms_helper drm
irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc
aesni_intel aes_x86_64 crypto_simd nls_iso8859_1 glue_helper nls_cp437
cryptd agpgart input_leds joydev iTCO_wdt vfat syscopyarea
intel_cstate evdev mousedev iTCO_vendor_support led_class sysfillrect
fat sysimgblt intel_rapl_perf pcspkr mac_hid i2c_i801 lpc_ich
mlx4_core fb_sys_fops e1000e ixgbe mei_me mei devlink mdio ptp
pps_core dca shpchp ipmi_si ipmi_devintf wmi ipmi_msghandler button
sch_fq_codel ip_tables x_tables xfs libcrc32c crc32c_generic sr_mod
ses cdrom enclosure sd_mod hid_generic usbhid hid uas usb_storage isci
ahci libsas libahci
[ 98.447354] ehci_pci ehci_hcd crc32c_intel mpt3sas raid_class
libata scsi_transport_sas usbcore usb_common scsi_mod
[ 98.459136] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.14.35-1-lts #1
[ 98.466439] Hardware name: HDS Hitachi Flash Storage/Hitachi Flash
Storage, BIOS 19.21 1/22/2016
[ 98.476269] task: ffffffff90012480 task.stack: ffffffff90000000
[ 98.482893] RIP: 0010:r10buf_pool_free+0x38/0xe0 [raid10]
[ 98.488933] RSP: 0018:ffff8f6aff003db8 EFLAGS: 00010206
[ 98.494779] RAX: 0000000000000060 RBX: ffff8f6ae673be00 RCX: ffff8f6aff003e28
[ 98.502754] RDX: 0000000000000000 RSI: ffff8f6ae2b53c80 RDI: ffff8f6ae2b53c80
[ 98.510727] RBP: ffff8f6ae2b53ce0 R08: ffff8f6aff003e2c R09: 0000000000000001
[ 98.518711] R10: 0000000000000080 R11: 0000000000000000 R12: 0000000000000000
[ 98.526695] R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000
[ 98.534679] FS: 0000000000000000(0000) GS:ffff8f6aff000000(0000)
knlGS:0000000000000000
[ 98.543734] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 98.550161] CR2: 0000000000000050 CR3: 000000089500a001 CR4: 00000000001606f0
[ 98.558145] Call Trace:
[ 98.560881] <IRQ>
[ 98.563136] put_buf+0x19/0x20 [raid10]
[ 98.567426] end_sync_request+0x6b/0x70 [raid10]
[ 98.572591] end_sync_write+0x9b/0x160 [raid10]
[ 98.577662] blk_update_request+0x78/0x2c0
[ 98.582254] scsi_end_request+0x2c/0x1e0 [scsi_mod]
[ 98.587719] scsi_io_completion+0x22f/0x610 [scsi_mod]
[ 98.593472] blk_done_softirq+0x8e/0xc0
[ 98.597767] __do_softirq+0xde/0x2b3
[ 98.601770] irq_exit+0xae/0xb0
[ 98.605285] do_IRQ+0x81/0xd0
[ 98.608606] common_interrupt+0x7d/0x7d
[ 98.612898] </IRQ>
[ 98.615252] RIP: 0010:cpuidle_enter_state+0xa2/0x2e0
[ 98.620806] RSP: 0018:ffffffff90003e90 EFLAGS: 00000246 ORIG_RAX:
ffffffffffffff7c
[ 98.630017] RAX: ffff8f6aff01f3c0 RBX: 00000016e5f01e98 RCX: 000000000000001f
[ 98.638748] RDX: 00000016e5f01e98 RSI: fffff8ab2f1ab602 RDI: 0000000000000000
[ 98.647482] RBP: ffff8f6aff028a70 R08: 00000000ffffffff R09: 000000000000000a
[ 98.656209] R10: ffffffff90003e70 R11: 000000000000000f R12: 0000000000000001
[ 98.664925] R13: ffffffff900ac098 R14: 0000000000000000 R15: 00000016e5efe111
[ 98.673642] do_idle+0x179/0x1d0
[ 98.677988] cpu_startup_entry+0x6f/0x80
[ 98.683100] start_kernel+0x4ae/0x4ce
[ 98.687910] secondary_startup_64+0xa5/0xb0
[ 98.693298] Code: 45 31 e4 55 53 48 83 ec 08 48 63 46 78 48 89 3c
24 44 8d 68 ff 41 83 fd ff 74 6f 48 8b 34 24 48 c1 e0 05 48 8d 2c 06
4c 8b 75 28 <4d> 8b 66 50 4d 8d 7c 24 08 49 8d 9c 24 88 00 00 00 49 8b
3f 48
[ 98.715888] RIP: r10buf_pool_free+0x38/0xe0 [raid10] RSP: ffff8f6aff003db8
[ 98.724296] CR2: 0000000000000050
[ 98.728716] ---[ end trace 26fbc93e654360aa ]---
[ 98.738337] Kernel panic - not syncing: Fatal exception in interrupt
[ 98.746173] Kernel Offset: 0xe000000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 98.762456] ---[ end Kernel panic - not syncing: Fatal exception in interrupt
Looks like it caused by the wrong bio num, r10buf_pool_alloc() allocates
2 bios
for recovery, but r10buf_pool_free() thinks there are j bios
(j=conf->copies).
Please try below to see it works or not:
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 3c60774c8430..840360a29de0 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -249,10 +249,15 @@ static void r10buf_pool_free(void *__r10_bio, void
*data)
{
struct r10conf *conf = data;
struct r10bio *r10bio = __r10_bio;
- int j;
+ int j, nalloc;
struct resync_pages *rp = NULL;
- for (j = conf->copies; j--; ) {
+ if (test_bit(MD_RECOVERY_SYNC, &conf->mddev->recovery) ||
+ test_bit(MD_RECOVERY_RESHAPE, &conf->mddev->recovery))
+ nalloc = conf->copies; /* resync */
+ else
+ nalloc = 2; /* recovery */
+ for (j = nalloc; j--; ) {
struct bio *bio = r10bio->devs[j].bio;
rp = get_resync_pages(bio);
Thanks,
Guoqing
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html