On 06/05/2019 18:07, Song Liu wrote: >> [...] >> I understand this could in theory affects all the RAID levels, but in >> practice I don't think it'll happen. RAID0 is the only "blind" mode of >> RAID, in the sense it's the only one that doesn't care at all with >> failures. In fact, this was the origin of my other thread [0], regarding >> the change of raid0's behavior in error cases..because it currently does >> not care with members being removed and rely only in filesystem failures >> (after submitting many BIOs to the removed device). >> >> That said, in this change I've only took care of raid0, since in my >> understanding the other levels won't submit BIOs to dead devices; we can >> experiment that to see if it's true. > > Could you please run a quick test with raid5? I am wondering whether > some race condition could get us into similar crash. If we cannot easily > trigger the bug, we can process with this version. > > Thanks, > Song Hi Song, I've tested both RAID5 (with 3 disks, removing one at a time), and also RAID 1 (2 disks, also removing one at a time); no issues observed in kernel 5.1. We can see one interesting message in kernel log: "super_written gets error=10", which corresponds to md detecting the error (bi_status == BLK_STS_IOERROR) and instantly failing the write, making FS read-only. So, I think really the issue happens only in RAID0, which writes "blindly" to its components. Let me know your thoughts - thanks again for your input! Cheers, Guilherme