On Mon, 2018-07-09 at 18:06 +0200, Andreas Klauer wrote: > On Mon, Jul 09, 2018 at 04:10:19PM +0200, Michael Niewöhner wrote: > > That would require a - at least in-memory - bitmap, too. > > Sure, but it would remove the need of keeping a persistent structure > with all the overhead and possibility of error. As I said you really wouldn't want all to do this on a degraded array because of the risk of loosing all your data. That is what RAID tries to avoid. > > I think there recently was a bug that caused data to not be > re-synced in certain cases. Sometimes syncing everything is > preferable to optimizing speed and then making a wrong decision > because the logic is complex and may ignore some corner cases. > > > LVM passes trim to the lower layers so no problem here. > > It would be about querying free space on the fly. > > There's no standard way to do that. fstrim is not standard either, > each filesystem decides how to go about it, some trim all free space, > others do it sometimes, or not support it at all. And for LVM, partitions, > etc. it's obviously different, and other storage layers are possible... Yes but I want to get the best out of what we know in this layer. It doesn't matter if there are some not-trimmed blocks missed because the upper layer trims at some time later. That will be some megabytes or maybe a few gigabytes... We only have to be VERY SURE that no allocated block is skipped. > > Anyway, wild idea. Bitmap is certainly more in line with what RAID does. > > As for the backing device idea, perhaps the thin provisioning target > (device mapper / LVM) would work too. That's the only thing in kernel > that already keeps track of trimmed space that I can think of. > > Not sure how much overhead that involves, but if you could build md > on top of thin-provisioning then query device mapper for used region, > that might work too. Just throwing ideas around. Hmm.. not sure what you mean. LVM thin as md-raid device? > > My personal setup is also different; I like to slice my drives into > partitions of same size to create separate RAIDs with, then merge > that together with LVM (each RAID is one PV). > > So I have several RAIDs like these: > > md5 : active raid6 sdf5[8] sde5[7] sdh5[5] sdg5[9] sdd5[3] sdc5[2] sdb5[1] > 1220692480 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/7] > [UUUUUUU] > > md4 : active raid6 sdf4[8] sde4[7] sdh4[5] sdg4[9] sdd4[3] sdc4[2] sdb4[1] > 1220692480 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/7] > [UUUUUUU] > > md3 : active raid6 sdf3[8] sde3[7] sdh3[5] sdg3[9] sdd3[3] sdc3[2] sdb3[1] > 1220692480 blocks super 1.2 level 6, 512k chunk, algorithm 2 [7/7] > [UUUUUUU] > > That can be used to speed up disk replacements, provided there are > entire PV without data, you can just --assume-clean those segments. > Or you can decide which PV are most important and sync those first. > Of course this is a manual process. Great if that works for you :-) For me and maybe for others it would not since I only have one PV. I want to find a general solution to make disk replacements faster and thereby more safe. Also this seems more like a "hack" to me. > > But that's super low resolution, as the number of partitions > and RAIDs you can run is obviously limited, and each RAID instance > comes with metadata offsets that take away usable space. > > Regards > Andreas Klauer > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html