On Thu, 25 Jul 2024 09:15:40 +0200 Mateusz Jończyk <mat.jonczyk@xxxxx> wrote: > Dnia 24 lipca 2024 23:19:06 CEST, Paul E Luse > <paul.e.luse@xxxxxxxxxxxxxxx> napisał/a: > >On Wed, 24 Jul 2024 22:35:49 +0200 > >Mateusz Jończyk <mat.jonczyk@xxxxx> wrote: > > > >> W dniu 22.07.2024 o 07:39, Mateusz Jończyk pisze: > >> > W dniu 20.07.2024 o 16:47, Mateusz Jończyk pisze: > >> >> Hello, > >> >> > >> >> In my laptop, I used to have two RAID1 arrays on top of NVMe and > >> >> SATA SSD drives: /dev/md0 for /boot (not partitioned), /dev/md1 > >> >> for remaining data (LUKS > >> >> + LVM + ext4). For performance, I have marked the RAID component > >> >> device for /dev/md1 on the SATA SSD drive write-mostly, which > >> >> "means that the 'md' driver will avoid reading from these > >> >> devices if at all possible" (man mdadm). > >> >> > >> >> Recently, the NVMe drive started having problems (PCI AER errors > >> >> and the controller disappearing), so I removed it from the > >> >> arrays and wiped it. However, I have reseated the drive in the > >> >> M.2 socket and this apparently fixed it (verified with tests). > >> >> > >> >> $ cat /proc/mdstat > >> >> Personalities : [raid1] [linear] [multipath] [raid0] [raid6] > >> >> [raid5] [raid4] [raid10] md1 : active raid1 sdb5[1](W) > >> >> 471727104 blocks super 1.2 [2/1] [_U] > >> >> bitmap: 4/4 pages [16KB], 65536KB chunk > >> >> > >> >> md2 : active (auto-read-only) raid1 sdb6[3](W) sda1[2] > >> >> 3142656 blocks super 1.2 [2/2] [UU] > >> >> bitmap: 0/1 pages [0KB], 65536KB chunk > >> >> > >> >> md0 : active raid1 sdb4[3] > >> >> 2094080 blocks super 1.2 [2/1] [_U] > >> >> > >> >> unused devices: <none> > >> >> > >> >> (md2 was used just for testing, ignore it). > >> >> > >> >> Today, I have tried to add the drive back to the arrays by > >> >> using a script that executed in quick succession: > >> >> > >> >> mdadm /dev/md0 --add --readwrite /dev/nvme0n1p2 > >> >> mdadm /dev/md1 --add --readwrite /dev/nvme0n1p3 > >> >> > >> >> This was on Linux 6.10.0, patched with my previous patch: > >> >> > >> >> https://lore.kernel.org/linux-raid/20240711202316.10775-1-mat.jonczyk@xxxxx/ > >> >> > >> >> (which fixed a regression in the kernel and allows it to start > >> >> /dev/md1 with a single drive in write-mostly mode). > >> >> In the background, I was running "rdiff-backup --compare" that > >> >> was comparing data between my array contents and a backup > >> >> attached via USB. > >> >> > >> >> This, however resulted in mayhem - I was unable to start any > >> >> program with an input-output error, etc. I used SysRQ + C to > >> >> save a kernel log: > >> >> > >> > Hello, > >> > > >> > It is possible that my second SSD has some problems and high read > >> > activity during RAID resync triggered it. Reads from that drive > >> > are now very slow (between 10 - 30 MB/s) and this suggests that > >> > something is not OK. > >> > >> Hello, > >> > >> Unfortunately, hardware failure seems not to be the case. > >> > >> I did test it again on 6.10, twice, and in both cases I got > >> filesystem corruption (but not as severe). > >> > >> On Linux 6.1.96 it seems to be working well (also did two tries). > >> > >> Please note: in my tests, I was using a RAID component device with > >> a write-mostly bit set. This setup does not work on 6.9+ out of the > >> box and requires the following patch: > >> > >> commit 36a5c03f23271 ("md/raid1: set max_sectors during early > >> return from choose_slow_rdev()") > >> > >> that is in master now. > >> > >> It is also heading into stable, which I'm going to interrupt. > > > >Hi Mateusz, > > > >I'm pretty interested in what is happening here especially as it > >relates to write-mostly. Couple of questions for you: > > > >1) Are you able to find a simpler reproduction for this, for example > >without mixing SATA and NVMe. Maybe just using two known good NVMe > >SSDs and follow your steps to repro? > > Hello, > > Well, I have three drives in my laptop: NVMe, SATA SSD (in the DVD > bay) and SATA HDD (platter). I could do tests on top of these two > SATA drives. But maybe it would be easier for me to bisect (or > guess-bisect) in the current setup, I haven't made up my mind yet. > OK, thanks. > > > >2) I don't fully understand your last two statements, maybe you can > >clarify? With your max_sectors patch does it pass or fail? If pass, > >what do mean by "I'm going to interrupt"? It sounds like you mean the > >patch doesn't work and you are trying to stop it?? > > Without this patch I wouldn't be able to do the tests. Without it, > degraded RAID1 with a single drive in write-mostly mode doesn’t start > at all. > > With my last statement I meant that I was going to stop this patch > from going to stable kernels. At this point, it doesn’t seem to me > that my patch is the direct cause of the problems, that I missed > something. However, I think that it is currently better to fail this > setup outright rather than risk somebody's data. > OK, I would say please do not try to stop the patch, it is a good fix although maybe not completely solving your problem it should land. Unless Kwai has another opinion. -Paul > I have made further tests: > > - vanilla 6.8.0 with a write-mostly drive works correctly, > > - vanilla 6.10-rc6 without the write mostly bit set also works > correctly. > > So it seems that the problem happens only with the write-mostly mode > and after 6.8.0. > > Greetings, > > Mateusz > >