Re: dm-thin f.req. : SEEK_DATA / SEEK_HOLE / SEEK_DISCARD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/03/12 11:14, Joe Thornber wrote:
On Tue, May 01, 2012 at 05:52:45PM +0200, Spelic wrote:
I'm looking at it right now
Well, I was thinking at a parent snapshot and child snapshot (or
anyway an older and a more recent snapshot of the same device) so
I'm not sure that's the feature I needed... probably I'm missing
something and need to study more
I'm not really following you here.  You can have arbitrary depth of
snapshots (snaps of snaps) if that helps.

I'm not following you either (you pointed me to the external snapshot feature but this would not be an "external origin" methinks...?), but this is probably irrelevant after having seen the rest of the replies because I now finally understand what metadata is available inside dm-thin. Thanks for such clear replies.

With your implementation there's the problem of fragmentation and RAID alignment vs discards implementation. With concurrent access to many thin provisioned devices, if blocksize is small, fragmentation is likely to come out bad, HDDs streaming reads can suffer a lot on fragmented areas (up to a factor 1000), and on parity raid, write performance would also suffer; while if blocksize is set to be large (such as one RAID stripe), block unmapping on discards is not likely to work because one discard per file would be received but most files would be smaller than a thinpool block (smaller than a RAID stripe: in fact it is recommended that the raid chunk is made equal to the prospected average file size so average file size and average discard size would be 1/N of the thinpool block size) so nothing would be unprovisioned.

There would be another way to do it (pls excuse my obvious arrogance and I know I should write code instead of write emails) two layers: blocksize for provisioning is e.g. 64M (this one should be customizable like you have now), while blocksize for tracking writes and discards is e.g. 4K. You make the btree only for the 64M blocks, and inside that you keep 2 bitmaps for tracking its 16384 4K-blocks. One bit is "4K block has been written", and if this is zero, reads go against the parent snapshot (this avoids CoW costs when provisioning a new 64M block). The other bit is "4K block has been discarded" and if this is set, reads return zero, and if all 16384 bits are set, the 64M block gets un-provisioned. This would play well with RAID alignment, with HDD fragmentation, with CoW (normally no cow performed if writes are 4K or bigger... "read optimizations" could do that afterwards if needed), with multiple small discards, with tracking differences between parent snapshot and current snapshot for remote replication, and with compressed backups which would see zeroes on all discarded areas. It should be possible to add this into your implementation because added metadata is just 2 bitmaps more for each block than what you have now. I would really like to try to write code for this but unfortunately I foresee I won't have time to write code for a good while. With this I don't want that to appear like I don't appreciate your current implementation which is great work, was very much needed, and in fact I will definitely use it for our production systems after 3.4 is stable (I was waiting for discards)


Y, I'll provide tools to let you do this.  If you wish to help with
writing a replicator please email me.  It's a project I'm keen to get
going.

Thanks for the opportunity but for now it seems I can only be a leech, at most I have time for writing a few emails :-(

Thank you
S.

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel


[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux