Re: [PATCH 6.10 678/809] md/raid1: set max_sectors during early return from choose_slow_rdev()

Mateusz Jończyk <mat.jonczyk@xxxxx> · Wed, 31 Jul 2024 21:43:58 +0200

W dniu 30.07.2024 o 17:49, Greg Kroah-Hartman pisze:
> 6.10-stable review patch.  If anyone has any objections, please let me know.
>
> ------------------
>
> From: Mateusz Jończyk <mat.jonczyk@xxxxx>
>
> commit 36a5c03f232719eb4e2d925f4d584e09cfaf372c upstream.
>
> Linux 6.9+ is unable to start a degraded RAID1 array with one drive,
> when that drive has a write-mostly flag set. During such an attempt,
> the following assertion in bio_split() is hit:
>
> 	BUG_ON(sectors <= 0);
>
> Call Trace:
> 	? bio_split+0x96/0xb0
> 	? exc_invalid_op+0x53/0x70
> 	? bio_split+0x96/0xb0
> 	? asm_exc_invalid_op+0x1b/0x20
> 	? bio_split+0x96/0xb0
> 	? raid1_read_request+0x890/0xd20
> 	? __call_rcu_common.constprop.0+0x97/0x260
> 	raid1_make_request+0x81/0xce0
> 	? __get_random_u32_below+0x17/0x70
> 	? new_slab+0x2b3/0x580
> 	md_handle_request+0x77/0x210
> 	md_submit_bio+0x62/0xa0
> 	__submit_bio+0x17b/0x230
> 	submit_bio_noacct_nocheck+0x18e/0x3c0
> 	submit_bio_noacct+0x244/0x670
>
> After investigation, it turned out that choose_slow_rdev() does not set
> the value of max_sectors in some cases and because of it,
> raid1_read_request calls bio_split with sectors == 0.
>
> Fix it by filling in this variable.
>
> This bug was introduced in
> commit dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()")
> but apparently hidden until
> commit 0091c5a269ec ("md/raid1: factor out helpers to choose the best rdev from read_balance()")
> shortly thereafter.
>
> Cc: stable@xxxxxxxxxxxxxxx # 6.9.x+
> Signed-off-by: Mateusz Jończyk <mat.jonczyk@xxxxx>
> Fixes: dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()")
> Cc: Song Liu <song@xxxxxxxxxx>
> Cc: Yu Kuai <yukuai3@xxxxxxxxxx>
> Cc: Paul Luse <paul.e.luse@xxxxxxxxxxxxxxx>
> Cc: Xiao Ni <xni@xxxxxxxxxx>
> Cc: Mariusz Tkaczyk <mariusz.tkaczyk@xxxxxxxxxxxxxxx>
> Link: https://lore.kernel.org/linux-raid/20240706143038.7253-1-mat.jonczyk@xxxxx/
> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>

Hello,

FYI there is a second regression in Linux 6.9 - 6.11, which occurs with RAID
component devices with a write-mostly flag when a new device is added
to the array. (A write-mostly flag on a device specifies that the kernel is to
avoid reading from such a device, if possible. It is enabled only manually with
a mdadm command line switch and can be beneficial when devices are of
different speed). The kernel than reads from the wrong component device
before it is synced, which may result in data corruption.

Link: https://lore.kernel.org/lkml/9952f532-2554-44bf-b906-4880b2e88e3a@xxxxx/T/

This is not caused by this patch, but only linked by similar functions and the
write-mostly flag being involved in both cases. The issue is that without this
patch, the kernel will fail to start or keep running a RAID array with a single
write-mostly device and the user will not be able to add another device to it,
which triggered the second regression.

Paul was of the opinion that this first patch should land nonetheless.
I would like you to decide whether to ship it now or defer it.

Greetings,

Mateusz