On 08.10.2024 05:32, Yu Kuai wrote:
在 2024/10/05 10:55, ValdikSS 写道:On 05.10.2024 04:35, ValdikSS wrote:Fedora 39 with 6.10.11-100.fc39 kernel dereferences NULL in raid10_size and locks up with 3-drive raid10 configuration upon its degradation and reattachment.How to reproduce: 1. Get 3 USB flash drives2. mdadm --create -b internal -l 10 -n 3 -z 1G /dev/md0 /dev/sda /dev/sdb /dev/sdc3. Unplug 2 USB drives 4. Plug one of the drive again Happens every time, every USB flash reattachment.Reproduced on 6.11.2-250.vanilla.fc39.x86_64Can you use addr2line or gdb to see which codeline is this? RIP: 0010:raid10_size+0x15/0x70 [raid10]
It's raid10.c:3768 (6.10.11) https://github.com/gregkh/linux/blob/8a886bee7aa574611df83a028ab435aeee071e00/drivers/md/raid10.c#L3768 raid10_size(struct mddev *mddev, sector_t sectors, int raid_disks) { sector_t size; struct r10conf *conf = mddev->private; if (!raid_disks) --> raid_disks = min(conf->geo.raid_disks, conf->prev.raid_disks); if (!sectors) sectors = conf->dev_sectors;
From code review, looks like this can only happen if raid10_run() return 0 while mddev->private(the raid10 conf) is still NULL. Can you also give the following patch a test?
It works, thanks a lot! No dereference, no oops. [ 47.933178] scsi host4: usb-storage 4-1.3:1.0[ 48.778815] scsi 3:0:0:0: Direct-Access ASolid USB PQ: 0 ANSI: 6 [ 48.779024] scsi 2:0:0:0: Direct-Access ASolid USB PQ: 0 ANSI: 6
[ 48.779968] sd 3:0:0:0: Attached scsi generic sg0 type 0 [ 48.781100] sd 2:0:0:0: Attached scsi generic sg1 type 0[ 48.782111] sd 3:0:0:0: [sda] 122880001 512-byte logical blocks: (62.9 GB/58.6 GiB) [ 48.782336] sd 2:0:0:0: [sdb] 122880001 512-byte logical blocks: (62.9 GB/58.6 GiB)
[ 48.782479] sd 3:0:0:0: [sda] Write Protect is off [ 48.782487] sd 3:0:0:0: [sda] Mode Sense: 23 00 00 00[ 48.782658] sd 3:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 48.782658] sd 2:0:0:0: [sdb] Write Protect is off [ 48.782667] sd 2:0:0:0: [sdb] Mode Sense: 23 00 00 00[ 48.782831] sd 2:0:0:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 48.787556] sd 2:0:0:0: [sdb] Attached SCSI removable disk [ 48.788403] sd 3:0:0:0: [sda] Attached SCSI removable disk [ 48.931643] md/raid10:md0: not enough operational mirrors. [ 48.948990] md: pers->run() failed ...[ 48.969698] scsi 4:0:0:0: Direct-Access ASolid USB PQ: 0 ANSI: 6
[ 48.970382] sd 4:0:0:0: Attached scsi generic sg2 type 0[ 48.971534] sd 4:0:0:0: [sdc] 122880001 512-byte logical blocks: (62.9 GB/58.6 GiB)
[ 48.971862] sd 4:0:0:0: [sdc] Write Protect is off [ 48.971868] sd 4:0:0:0: [sdc] Mode Sense: 23 00 00 00[ 48.972058] sd 4:0:0:0: [sdc] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 48.976216] sd 4:0:0:0: [sdc] Attached SCSI removable disk [ 49.067973] md0: ADD_NEW_DISK not supported
Thanks, Kuai diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index f3bf1116794a..b7f2530ae257 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c @@ -4061,9 +4061,13 @@ static int raid10_run(struct mddev *mddev) } if (!mddev_is_dm(conf->mddev)) { - ret = raid10_set_queue_limits(mddev); - if (ret) + /* don't overwrite ret on success */ + int err = raid10_set_queue_limits(mddev); + + if (err) { + ret = err; goto out_free_conf; + } }/* need to check that every block has at least one working mirror */Thanks, Kuai .
Attachment:
OpenPGP_signature.asc
Description: OpenPGP digital signature