W dniu 22.07.2024 o 07:39, Mateusz Jończyk pisze: > W dniu 20.07.2024 o 16:47, Mateusz Jończyk pisze: >> Hello, >> >> In my laptop, I used to have two RAID1 arrays on top of NVMe and SATA SSD >> drives: /dev/md0 for /boot (not partitioned), /dev/md1 for remaining data (LUKS >> + LVM + ext4). For performance, I have marked the RAID component device for >> /dev/md1 on the SATA SSD drive write-mostly, which "means that the 'md' driver >> will avoid reading from these devices if at all possible" (man mdadm). >> >> Recently, the NVMe drive started having problems (PCI AER errors and the >> controller disappearing), so I removed it from the arrays and wiped it. >> However, I have reseated the drive in the M.2 socket and this apparently fixed >> it (verified with tests). >> >> $ cat /proc/mdstat >> Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] >> md1 : active raid1 sdb5[1](W) >> 471727104 blocks super 1.2 [2/1] [_U] >> bitmap: 4/4 pages [16KB], 65536KB chunk >> >> md2 : active (auto-read-only) raid1 sdb6[3](W) sda1[2] >> 3142656 blocks super 1.2 [2/2] [UU] >> bitmap: 0/1 pages [0KB], 65536KB chunk >> >> md0 : active raid1 sdb4[3] >> 2094080 blocks super 1.2 [2/1] [_U] >> >> unused devices: <none> >> >> (md2 was used just for testing, ignore it). >> >> Today, I have tried to add the drive back to the arrays by using a script that >> executed in quick succession: >> >> mdadm /dev/md0 --add --readwrite /dev/nvme0n1p2 >> mdadm /dev/md1 --add --readwrite /dev/nvme0n1p3 >> >> This was on Linux 6.10.0, patched with my previous patch: >> >> https://lore.kernel.org/linux-raid/20240711202316.10775-1-mat.jonczyk@xxxxx/ >> >> (which fixed a regression in the kernel and allows it to start /dev/md1 with a >> single drive in write-mostly mode). >> In the background, I was running "rdiff-backup --compare" that was comparing >> data between my array contents and a backup attached via USB. >> >> This, however resulted in mayhem - I was unable to start any program with an >> input-output error, etc. I used SysRQ + C to save a kernel log: >> > Hello, > > It is possible that my second SSD has some problems and high read activity > during RAID resync triggered it. Reads from that drive are now very slow (between > 10 - 30 MB/s) and this suggests that something is not OK. Hello, Unfortunately, hardware failure seems not to be the case. I did test it again on 6.10, twice, and in both cases I got filesystem corruption (but not as severe). On Linux 6.1.96 it seems to be working well (also did two tries). Please note: in my tests, I was using a RAID component device with a write-mostly bit set. This setup does not work on 6.9+ out of the box and requires the following patch: commit 36a5c03f23271 ("md/raid1: set max_sectors during early return from choose_slow_rdev()") that is in master now. It is also heading into stable, which I'm going to interrupt. Greetings, Mateusz