Re: Reduce resync time by keeping track of referenced blocks

Michael Niewöhner <linux@xxxxxxxxxxxxxx> · Mon, 09 Jul 2018 19:58:06 +0200

On Mon, 2018-07-09 at 18:06 +0200, Andreas Klauer wrote:
> On Mon, Jul 09, 2018 at 04:10:19PM +0200, Michael Niewöhner wrote:
> > That would require a - at least in-memory - bitmap, too.
> 
> Sure, but it would remove the need of keeping a persistent structure 
> with all the overhead and possibility of error.

As I said you really wouldn't want all to do this on a degraded array because of
the risk of loosing all your data. That is what RAID tries to avoid.

> 
> I think there recently was a bug that caused data to not be 
> re-synced in certain cases. Sometimes syncing everything is 
> preferable to optimizing speed and then making a wrong decision 
> because the logic is complex and may ignore some corner cases.
> 
> > LVM passes trim to the lower layers so no problem here.
> 
> It would be about querying free space on the fly.
> 
> There's no standard way to do that. fstrim is not standard either, 
> each filesystem decides how to go about it, some trim all free space, 
> others do it sometimes, or not support it at all. And for LVM, partitions,
> etc. it's obviously different, and other storage layers are possible...

Yes but I want to get the best out of what we know in this layer.
It doesn't matter if there are some not-trimmed blocks missed because the upper
layer trims at some time later. That will be some megabytes or maybe a few
gigabytes...
We only have to be VERY SURE that no allocated block is skipped.

> 
> Anyway, wild idea. Bitmap is certainly more in line with what RAID does.
> 
> As for the backing device idea, perhaps the thin provisioning target 
> (device mapper / LVM) would work too. That's the only thing in kernel 
> that already keeps track of trimmed space that I can think of.
> 
> Not sure how much overhead that involves, but if you could build md 
> on top of thin-provisioning then query device mapper for used region, 
> that might work too. Just throwing ideas around.

Hmm.. not sure what you mean. LVM thin as md-raid device?

> 
> My personal setup is also different; I like to slice my drives into 
> partitions of same size to create separate RAIDs with, then merge 
> that together with LVM (each RAID is one PV).
> 
> So I have several RAIDs like these:
> 
> md5 : active raid6 sdf5[8] sde5[7] sdh5[5] sdg5[9] sdd5[3] sdc5[2] sdb5[1]
>       1220692480 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/7]
> [UUUUUUU]
>       
> md4 : active raid6 sdf4[8] sde4[7] sdh4[5] sdg4[9] sdd4[3] sdc4[2] sdb4[1]
>       1220692480 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/7]
> [UUUUUUU]
>       
> md3 : active raid6 sdf3[8] sde3[7] sdh3[5] sdg3[9] sdd3[3] sdc3[2] sdb3[1]
>       1220692480 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/7]
> [UUUUUUU]
> 
> That can be used to speed up disk replacements, provided there are 
> entire PV without data, you can just --assume-clean those segments. 
> Or you can decide which PV are most important and sync those first. 
> Of course this is a manual process.

Great if that works for you :-) For me and maybe for others it would not since I
only have one PV. I want to find a general solution to make disk replacements
faster and thereby more safe. Also this seems more like a "hack" to me.

> 
> But that's super low resolution, as the number of partitions 
> and RAIDs you can run is obviously limited, and each RAID instance 
> comes with metadata offsets that take away usable space.
> 
> Regards
> Andreas Klauer
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html