Bugfix / feature requests for raid5cache (writeback)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello all
(especially Song Liu and Shaohua Li),
there has just been a thread on raid5cache so I thought about writing.
I have read the code of raid5cache a bit and I would have a few ideas, bugfix / feature requests let's say.
In order of decreasing importance:

1- Would you fix this?
https://www.spinics.net/lists/raid/msg61331.html
raid5-cache: deeply broken (with write-back?)
Is it fixed by the following patch?
https://www.spinics.net/lists/raid/msg60713.html
but it's currently not applied upstream in latest v4.20.4
The bug is serious (raid unmountable) also because the writeback cache can be enormous and partially full and currently does not write back completely during idle times (see point "Write back during idle times" below)


2- Workaround for liar disks
You know, many disks lie about flush, especially SSDs. This easily corrupts a RAID array because the various members of the array have a different idea of the last writes which happened. Testing for liar disks is very difficult (somewhat feasible with diskchecker.pl from Postgresql) and no hardware review website currently does that unfortunately.
Lying can happen on both the cache disks and the RAID disks.
Lying by the cache disks probably cannot be worked around from here, but lying by the RAID disks could. There should be an additional pointer to the log (replay_ptr) which stays at least XX MB behind and at least YY seconds (both configurable) behind the last flush to the RAID disks, and such area of the log, from replay_ptr to the last flush, should still be considered occupied and cannot be overwritten/reclaimed. In this way, if there is a power loss, the area from the replay_ptr onwards will eventually be replayed to the RAID disks. I guess battery backed RAID controllers do something like this, as they are known to usually fix the liar disks problem.


3- Write back during idle times
It seems to me that with current code the cache will forever stay not-empty even in case of low amounts of writes. The raid5cache apparently does not leverage moments of idleness to writeback (clean) itself completely to the array AFAICS... You might want to leverage those moments, because raid5cache is apparently not able to coalesce random writes from distant points of the cache anyway, so there is no point in waiting. If there are random writes around sector 10000, and then other writes elsewhere, and then after some time some more random writes around sector 10000, it seems to me raid5cache is not able to coalesce the two groups of random writes around sector 10000, so it probably makes sense to write back the first group of random writes as soon as there is idle time, no? The current situation greatly worsens the case of cache disks lost, which I know is normally regarded as catastrophic, but could be "less catastrophic" anyway, and can happen even due to a software bug, such as point #1 above.


Thanks for your work
N.B.




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux