On Mon, 2018-07-09 at 15:00 +0200, Michael Niewöhner wrote: > On Sun, 2018-07-08 at 19:23 +0100, Wols Lists wrote: > > The problem with this is that (a) MD has no idea which bits of the disk > > are in use and which are not, and (b) just copying the bits that are in > > use will leave the unused bits in an unsync'd state, which will then > > moan blue murder on a check. > > > > It would make sense to prioritise that used stuff, but you do need to > > copy everything. If, however, we could get TRIM to actually zero the > > disk, that might well speed things up. It will need a lot of thinking > > through, though. It's a lot trickier than it sounds. > > TRIM does not guarantee that the block is zero'ed physically on disk. > We would not even want this because it would waste time. > There are disks with DRAT/RZAT feature that guarantee to return zero for > previously cleared/trimmed blocks but not all disks support that. > Some disks even return random data for trimmed blocks. > > I think both problems can be solved by keeping track of used blocks by upper > layer in a bitmap-like structure in metadata. That bitmap needs to be > redundant > just as the data. I don't know from memory if metadata is redundant or per- > disk. > raid-bitmap / write-intent bitmap does something like that but in-memory - we > need our bitmap to be on disk(s). > > When one of the disks gets replaced first the bitmap has to be synced and then > the raid data based on that bitmap. An array check would simply ignore unused > out-of-sync blocks. > > Every read and write to such an bitmap-based raid array would need to > check/alter the bitmap. > > One problem I see is that every write will mean two writes: bitmap and data. > Maybe the bitmap could be hold in-memory and synced to disk periodically e.g. > every 5 seconds? Other ideas welcome.. RZAT is the correct one: Deterministic Read Zero after TRIM My description above is basically software-based RZAT in linux-raid. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html