Re: RFC: use TRIM data from filesystems to speed up array rebuild?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/09/12 19:42, David Brown wrote:
On 06/09/12 19:17, Benjamin ESTRABAUD wrote:
On 04/09/12 21:24, NeilBrown wrote:
On Tue, 04 Sep 2012 15:11:26 -0400 Ric Wheeler<ricwheeler@xxxxxxxxx>
wrote:

On 09/04/2012 02:06 PM, Chris Friesen wrote:
Hi,

I'm not really a filesystem guy so this may be a really dumb question.

We currently have an issue where we have a ~1TB RAID1 array that is
mostly
given over to LVM. If we swap one of the disks it will rebuild
everything,
even though we may only be using a small fraction of the space.

This got me thinking. Has anyone given thought to using the TRIM
information
from filesystems to allow the RAID code to maintain a bitmask of
used disk
blocks and only sync the ones that are actually used?

Presumably this bitmask would itself need to be stored on the disk.

Thanks,
Chris

Device mapper has a "thin" target now that tracks blocks that are
allocated or
free (and works with discard).

That might be a basis for doing an focused RAID rebuild,
I wonder how....
Maybe the block-later interface could grow something equivalent to
"SEEK_HOLE" and friends so that the upper level can find "holes" and
"allocated space" in the underlying device.
I wonder if it is time to discard the 'block device' abstraction and
just use
files every .... but I seriously doubt it.

NeilBrown
Hi,

I've got a brief question about this feature that seems extremely
promising:

You mentioned on your blog:

"A 'write' to a non-in-sync region should cause that region to be
resynced. Writing zeros would in some sense be ideal, but to do that we
would have to block the write, which would be unfortunate."

So, if we had a write on a "non-in-sync" region (let's imagine the
bitmap allows for 1M granularity), we would compute the parity of every
stripe that this write "touches" and update it? Is the solution zeroing
the area used to save time reading and writing the data on the stripe to
compute the parity, as well as any other stripes that are referenced by
this "non-in-sync" region, even if the write wouldn't affect them,
allowing us to then flip that entire region to "clean"?

That would, I think, be correct. All zeros are the easiest to calculate - the parities (raid5 and raid6) are all zeros too. It is also the ideal pattern to write to SSDs - many SSDs these days implement transparent compression, and you don't get more compressible than zeros!


Would this open the door to some "thin provisioned" MD RAID, where one
could grow the underlying devices (in the case of a RAID built ontop of
say LVM devices), and marking the new "space" as "non-in-sync" without
disrupting (slowing) operations on the array with a sync?


Yes, that would work. More importantly (because it would affect more people), it means that the creation of a md raid array on top of disks or partitions will immediately be "in sync", and there would be no need for a long and effectively useless re-sync process at creation.

In any case, seems like a great feature.

Yes indeed.


Regards,
Ben.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html



--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Thank you very much for your reply!

Regards,
Ben.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux