Dnia 24 lipca 2024 23:19:06 CEST, Paul E Luse <paul.e.luse@xxxxxxxxxxxxxxx> napisał/a: >On Wed, 24 Jul 2024 22:35:49 +0200 >Mateusz Jończyk <mat.jonczyk@xxxxx> wrote: > >> W dniu 22.07.2024 o 07:39, Mateusz Jończyk pisze: >> > W dniu 20.07.2024 o 16:47, Mateusz Jończyk pisze: >> >> Hello, >> >> >> >> In my laptop, I used to have two RAID1 arrays on top of NVMe and >> >> SATA SSD drives: /dev/md0 for /boot (not partitioned), /dev/md1 >> >> for remaining data (LUKS >> >> + LVM + ext4). For performance, I have marked the RAID component >> >> device for /dev/md1 on the SATA SSD drive write-mostly, which >> >> "means that the 'md' driver will avoid reading from these devices >> >> if at all possible" (man mdadm). >> >> >> >> Recently, the NVMe drive started having problems (PCI AER errors >> >> and the controller disappearing), so I removed it from the arrays >> >> and wiped it. However, I have reseated the drive in the M.2 socket >> >> and this apparently fixed it (verified with tests). >> >> >> >> $ cat /proc/mdstat >> >> Personalities : [raid1] [linear] [multipath] [raid0] [raid6] >> >> [raid5] [raid4] [raid10] md1 : active raid1 sdb5[1](W) >> >> 471727104 blocks super 1.2 [2/1] [_U] >> >> bitmap: 4/4 pages [16KB], 65536KB chunk >> >> >> >> md2 : active (auto-read-only) raid1 sdb6[3](W) sda1[2] >> >> 3142656 blocks super 1.2 [2/2] [UU] >> >> bitmap: 0/1 pages [0KB], 65536KB chunk >> >> >> >> md0 : active raid1 sdb4[3] >> >> 2094080 blocks super 1.2 [2/1] [_U] >> >> >> >> unused devices: <none> >> >> >> >> (md2 was used just for testing, ignore it). >> >> >> >> Today, I have tried to add the drive back to the arrays by using a >> >> script that executed in quick succession: >> >> >> >> mdadm /dev/md0 --add --readwrite /dev/nvme0n1p2 >> >> mdadm /dev/md1 --add --readwrite /dev/nvme0n1p3 >> >> >> >> This was on Linux 6.10.0, patched with my previous patch: >> >> >> >> https://lore.kernel.org/linux-raid/20240711202316.10775-1-mat.jonczyk@xxxxx/ >> >> >> >> (which fixed a regression in the kernel and allows it to start >> >> /dev/md1 with a single drive in write-mostly mode). >> >> In the background, I was running "rdiff-backup --compare" that was >> >> comparing data between my array contents and a backup attached via >> >> USB. >> >> >> >> This, however resulted in mayhem - I was unable to start any >> >> program with an input-output error, etc. I used SysRQ + C to save >> >> a kernel log: >> >> >> > Hello, >> > >> > It is possible that my second SSD has some problems and high read >> > activity during RAID resync triggered it. Reads from that drive are >> > now very slow (between 10 - 30 MB/s) and this suggests that >> > something is not OK. >> >> Hello, >> >> Unfortunately, hardware failure seems not to be the case. >> >> I did test it again on 6.10, twice, and in both cases I got >> filesystem corruption (but not as severe). >> >> On Linux 6.1.96 it seems to be working well (also did two tries). >> >> Please note: in my tests, I was using a RAID component device with >> a write-mostly bit set. This setup does not work on 6.9+ out of the >> box and requires the following patch: >> >> commit 36a5c03f23271 ("md/raid1: set max_sectors during early return >> from choose_slow_rdev()") >> >> that is in master now. >> >> It is also heading into stable, which I'm going to interrupt. > >Hi Mateusz, > >I'm pretty interested in what is happening here especially as it >relates to write-mostly. Couple of questions for you: > >1) Are you able to find a simpler reproduction for this, for example >without mixing SATA and NVMe. Maybe just using two known good NVMe >SSDs and follow your steps to repro? Hello, Well, I have three drives in my laptop: NVMe, SATA SSD (in the DVD bay) and SATA HDD (platter). I could do tests on top of these two SATA drives. But maybe it would be easier for me to bisect (or guess-bisect) in the current setup, I haven't made up my mind yet. > >2) I don't fully understand your last two statements, maybe you can >clarify? With your max_sectors patch does it pass or fail? If pass, >what do mean by "I'm going to interrupt"? It sounds like you mean the >patch doesn't work and you are trying to stop it?? Without this patch I wouldn't be able to do the tests. Without it, degraded RAID1 with a single drive in write-mostly mode doesn’t start at all. With my last statement I meant that I was going to stop this patch from going to stable kernels. At this point, it doesn’t seem to me that my patch is the direct cause of the problems, that I missed something. However, I think that it is currently better to fail this setup outright rather than risk somebody's data. I have made further tests: - vanilla 6.8.0 with a write-mostly drive works correctly, - vanilla 6.10-rc6 without the write mostly bit set also works correctly. So it seems that the problem happens only with the write-mostly mode and after 6.8.0. Greetings, Mateusz