Re: [PATCH 6.10 678/809] md/raid1: set max_sectors during early return from choose_slow_rdev()

Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> · Thu, 1 Aug 2024 07:01:31 +0200

On Wed, Jul 31, 2024 at 09:43:58PM +0200, Mateusz Jończyk wrote:
> W dniu 30.07.2024 o 17:49, Greg Kroah-Hartman pisze:
> > 6.10-stable review patch.  If anyone has any objections, please let me know.
> >
> > ------------------
> >
> > From: Mateusz Jończyk <mat.jonczyk@xxxxx>
> >
> > commit 36a5c03f232719eb4e2d925f4d584e09cfaf372c upstream.
> >
> > Linux 6.9+ is unable to start a degraded RAID1 array with one drive,
> > when that drive has a write-mostly flag set. During such an attempt,
> > the following assertion in bio_split() is hit:
> >
> > 	BUG_ON(sectors <= 0);
> >
> > Call Trace:
> > 	? bio_split+0x96/0xb0
> > 	? exc_invalid_op+0x53/0x70
> > 	? bio_split+0x96/0xb0
> > 	? asm_exc_invalid_op+0x1b/0x20
> > 	? bio_split+0x96/0xb0
> > 	? raid1_read_request+0x890/0xd20
> > 	? __call_rcu_common.constprop.0+0x97/0x260
> > 	raid1_make_request+0x81/0xce0
> > 	? __get_random_u32_below+0x17/0x70
> > 	? new_slab+0x2b3/0x580
> > 	md_handle_request+0x77/0x210
> > 	md_submit_bio+0x62/0xa0
> > 	__submit_bio+0x17b/0x230
> > 	submit_bio_noacct_nocheck+0x18e/0x3c0
> > 	submit_bio_noacct+0x244/0x670
> >
> > After investigation, it turned out that choose_slow_rdev() does not set
> > the value of max_sectors in some cases and because of it,
> > raid1_read_request calls bio_split with sectors == 0.
> >
> > Fix it by filling in this variable.
> >
> > This bug was introduced in
> > commit dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()")
> > but apparently hidden until
> > commit 0091c5a269ec ("md/raid1: factor out helpers to choose the best rdev from read_balance()")
> > shortly thereafter.
> >
> > Cc: stable@xxxxxxxxxxxxxxx # 6.9.x+
> > Signed-off-by: Mateusz Jończyk <mat.jonczyk@xxxxx>
> > Fixes: dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()")
> > Cc: Song Liu <song@xxxxxxxxxx>
> > Cc: Yu Kuai <yukuai3@xxxxxxxxxx>
> > Cc: Paul Luse <paul.e.luse@xxxxxxxxxxxxxxx>
> > Cc: Xiao Ni <xni@xxxxxxxxxx>
> > Cc: Mariusz Tkaczyk <mariusz.tkaczyk@xxxxxxxxxxxxxxx>
> > Link: https://lore.kernel.org/linux-raid/20240706143038.7253-1-mat.jonczyk@xxxxx/
> > Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> 
> Hello,
> 
> FYI there is a second regression in Linux 6.9 - 6.11, which occurs with RAID
> component devices with a write-mostly flag when a new device is added
> to the array. (A write-mostly flag on a device specifies that the kernel is to
> avoid reading from such a device, if possible. It is enabled only manually with
> a mdadm command line switch and can be beneficial when devices are of
> different speed). The kernel than reads from the wrong component device
> before it is synced, which may result in data corruption.
> 
> Link: https://lore.kernel.org/lkml/9952f532-2554-44bf-b906-4880b2e88e3a@xxxxx/T/
> 
> This is not caused by this patch, but only linked by similar functions and the
> write-mostly flag being involved in both cases. The issue is that without this
> patch, the kernel will fail to start or keep running a RAID array with a single
> write-mostly device and the user will not be able to add another device to it,
> which triggered the second regression.
> 
> Paul was of the opinion that this first patch should land nonetheless.
> I would like you to decide whether to ship it now or defer it.

Is there a fix for this anywhere?  If not, being in sync with Linus's
tree is probably the best solution for now.

thanks,

greg k-h