Hi,
在 2024/10/08 23:01, ValdikSS 写道:
On 08.10.2024 05:32, Yu Kuai wrote:
在 2024/10/05 10:55, ValdikSS 写道:
On 05.10.2024 04:35, ValdikSS wrote:
Fedora 39 with 6.10.11-100.fc39 kernel dereferences NULL in
raid10_size and locks up with 3-drive raid10 configuration upon its
degradation and reattachment.
How to reproduce:
1. Get 3 USB flash drives
2. mdadm --create -b internal -l 10 -n 3 -z 1G /dev/md0 /dev/sda
/dev/sdb /dev/sdc
3. Unplug 2 USB drives
4. Plug one of the drive again
Happens every time, every USB flash reattachment.
Reproduced on 6.11.2-250.vanilla.fc39.x86_64
Can you use addr2line or gdb to see which codeline is this?
RIP: 0010:raid10_size+0x15/0x70 [raid10]
It's raid10.c:3768 (6.10.11)
https://github.com/gregkh/linux/blob/8a886bee7aa574611df83a028ab435aeee071e00/drivers/md/raid10.c#L3768
raid10_size(struct mddev *mddev, sector_t sectors, int raid_disks)
{
sector_t size;
struct r10conf *conf = mddev->private;
if (!raid_disks)
--> raid_disks = min(conf->geo.raid_disks,
conf->prev.raid_disks);
if (!sectors)
sectors = conf->dev_sectors;
From code review, looks like this can only happen if raid10_run() return
0 while mddev->private(the raid10 conf) is still NULL. Can you also give
the following patch a test?
It works, thanks a lot! No dereference, no oops.
Thanks for the test! Will send a patch soon.
Kuai
[ 47.933178] scsi host4: usb-storage 4-1.3:1.0
[ 48.778815] scsi 3:0:0:0: Direct-Access ASolid USB PQ: 0
ANSI: 6
[ 48.779024] scsi 2:0:0:0: Direct-Access ASolid USB PQ: 0
ANSI: 6
[ 48.779968] sd 3:0:0:0: Attached scsi generic sg0 type 0
[ 48.781100] sd 2:0:0:0: Attached scsi generic sg1 type 0
[ 48.782111] sd 3:0:0:0: [sda] 122880001 512-byte logical blocks:
(62.9 GB/58.6 GiB)
[ 48.782336] sd 2:0:0:0: [sdb] 122880001 512-byte logical blocks:
(62.9 GB/58.6 GiB)
[ 48.782479] sd 3:0:0:0: [sda] Write Protect is off
[ 48.782487] sd 3:0:0:0: [sda] Mode Sense: 23 00 00 00
[ 48.782658] sd 3:0:0:0: [sda] Write cache: disabled, read cache:
enabled, doesn't support DPO or FUA
[ 48.782658] sd 2:0:0:0: [sdb] Write Protect is off
[ 48.782667] sd 2:0:0:0: [sdb] Mode Sense: 23 00 00 00
[ 48.782831] sd 2:0:0:0: [sdb] Write cache: disabled, read cache:
enabled, doesn't support DPO or FUA
[ 48.787556] sd 2:0:0:0: [sdb] Attached SCSI removable disk
[ 48.788403] sd 3:0:0:0: [sda] Attached SCSI removable disk
[ 48.931643] md/raid10:md0: not enough operational mirrors.
[ 48.948990] md: pers->run() failed ...
[ 48.969698] scsi 4:0:0:0: Direct-Access ASolid USB PQ: 0
ANSI: 6
[ 48.970382] sd 4:0:0:0: Attached scsi generic sg2 type 0
[ 48.971534] sd 4:0:0:0: [sdc] 122880001 512-byte logical blocks:
(62.9 GB/58.6 GiB)
[ 48.971862] sd 4:0:0:0: [sdc] Write Protect is off
[ 48.971868] sd 4:0:0:0: [sdc] Mode Sense: 23 00 00 00
[ 48.972058] sd 4:0:0:0: [sdc] Write cache: disabled, read cache:
enabled, doesn't support DPO or FUA
[ 48.976216] sd 4:0:0:0: [sdc] Attached SCSI removable disk
[ 49.067973] md0: ADD_NEW_DISK not supported
Thanks,
Kuai
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index f3bf1116794a..b7f2530ae257 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4061,9 +4061,13 @@ static int raid10_run(struct mddev *mddev)
}
if (!mddev_is_dm(conf->mddev)) {
- ret = raid10_set_queue_limits(mddev);
- if (ret)
+ /* don't overwrite ret on success */
+ int err = raid10_set_queue_limits(mddev);
+
+ if (err) {
+ ret = err;
goto out_free_conf;
+ }
}
/* need to check that every block has at least one working
mirror */
Thanks,
Kuai
.