Re: Reduce resync time by keeping track of referenced blocks

Andreas Klauer <Andreas.Klauer@xxxxxxxxxxxxxx> · Mon, 9 Jul 2018 18:06:58 +0200

On Mon, Jul 09, 2018 at 04:10:19PM +0200, Michael Niewöhner wrote:
> That would require a - at least in-memory - bitmap, too.

Sure, but it would remove the need of keeping a persistent structure 
with all the overhead and possibility of error.

I think there recently was a bug that caused data to not be 
re-synced in certain cases. Sometimes syncing everything is 
preferable to optimizing speed and then making a wrong decision 
because the logic is complex and may ignore some corner cases.

> LVM passes trim to the lower layers so no problem here.

It would be about querying free space on the fly.

There's no standard way to do that. fstrim is not standard either, 
each filesystem decides how to go about it, some trim all free space, 
others do it sometimes, or not support it at all. And for LVM, partitions,
etc. it's obviously different, and other storage layers are possible...

Anyway, wild idea. Bitmap is certainly more in line with what RAID does.

As for the backing device idea, perhaps the thin provisioning target 
(device mapper / LVM) would work too. That's the only thing in kernel 
that already keeps track of trimmed space that I can think of.

Not sure how much overhead that involves, but if you could build md 
on top of thin-provisioning then query device mapper for used region, 
that might work too. Just throwing ideas around.

My personal setup is also different; I like to slice my drives into 
partitions of same size to create separate RAIDs with, then merge 
that together with LVM (each RAID is one PV).

So I have several RAIDs like these:

md5 : active raid6 sdf5[8] sde5[7] sdh5[5] sdg5[9] sdd5[3] sdc5[2] sdb5[1]
      1220692480 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/7] [UUUUUUU]

md4 : active raid6 sdf4[8] sde4[7] sdh4[5] sdg4[9] sdd4[3] sdc4[2] sdb4[1]
      1220692480 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/7] [UUUUUUU]

md3 : active raid6 sdf3[8] sde3[7] sdh3[5] sdg3[9] sdd3[3] sdc3[2] sdb3[1]
      1220692480 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/7] [UUUUUUU]

That can be used to speed up disk replacements, provided there are 
entire PV without data, you can just --assume-clean those segments. 
Or you can decide which PV are most important and sync those first. 
Of course this is a manual process.

But that's super low resolution, as the number of partitions 
and RAIDs you can run is obviously limited, and each RAID instance 
comes with metadata offsets that take away usable space.

Regards
Andreas Klauer
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html