Re: Best method to limit snapshot/clone space overhead

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07/23/2015 06:31 AM, Jan Schermer wrote:
Hi all,
I am looking for a way to alleviate the overhead of RBD snapshots/clones for some time.

In our scenario there are a few “master” volumes that contain production data, and are frequently snapshotted and cloned for dev/qa use. Those snapshots/clones live for a few days to a few weeks before they get dropped, and they sometimes grow very fast (databases, etc.).

With the default 4MB object size there seems to be huge overhead involved with this, could someone give me some hints on how to solve that?

I have some hope in

1) FIEMAP
I’ve calculated that files on my OSDs are approx. 30% filled with NULLs - I suppose this is what it could save (best-scenario) and it should also make COW operations much faster.
But there are lots of bugs in FIEMAP in kernels (i saw some reference to CentOS 6.5 kernel being buggy - which is what we use) and filesystems (like XFS). No idea about ext4 which we’d like to use in the future.

Is enabling FIEMAP a good idea at all? I saw some mention of it being replaced with SEEK_DATA and SEEK_HOLE.

fiemap (and ceph's use of it) has been buggy on all fses in the past.
SEEK_DATA and SEEK_HOLE are the proper interfaces to use for these
purposes. That said, it's not incredibly well tested since it's off by
default, so I wouldn't recommend using it without careful testing on
the fs you're using. I wouldn't expect it to make much of a difference
if you use small objects.

2) object size < 4MB for clones
I did some quick performance testing and setting this lower for production is probably not a good idea. My sweet spot is 8MB object size, however this would make the overhead for clones even worse than it already is.
But I could make the cloned images with a different block size from the snapshot (at least according to docs). Does someone use it like that? Any caveats? That way I could have the production data with 8MB block size but make the development snapshots with for example 64KiB granularity, probably at expense of some performance, but most of the data would remain in the (faster) master snapshot anyway. This should drop overhead tremendously, maybe even more than neabling FIEMAP. (Even better when working in tandem I suppose?)

Since these clones are relatively short-lived this seems like a better
way to go in the short term. 64k may be extreme, but if there aren't
too many of these clones it's not a big deal. There is more overhead
for recovery and scrub with smaller objects, so I wouldn't recommend
using tiny objects in general.

It'll be interesting to see your results. I'm not sure many folks
have looked at optimizing this use case.

Josh
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux