Re: RBD snapshots very slow

Jason Dillaman <jdillama@xxxxxxxxxx> · Fri, 20 Mar 2020 13:41:11 -0400

On Fri, Mar 20, 2020 at 1:29 PM <vitalif@xxxxxxxxxx> wrote:
>
> Hi.
>
> For a long time I was under an impression that clones are as efficient
> in bluestore as snapshots.
>
> But today I finally decided to test it and ... I discovered it was an
> utterly wrong impression :) RBD copies the whole 4 MB object even when a
> small 4 KB block is modified within it in the child image. In my
> all-NVMe cluster this leads to 40 (40!!!) random write iops (bs=4k
> iodepth=1) in a fresh RBD clone, which is terrible.

Anything with an iodepth of 1 is going to be (relatively) terrible on RBD.

> Question of the day: is it possible to reimplement RBD clones using
> "sparse objects"? As I understand the support for sparse objects
> themselves is already there. So maybe librbd could only write the
> modified part to the child image when writing and read "holes" from
> parents when reading?

The forthcoming Octopus release of librbd adds support for sparse
copy-up writes [1] when your min OSD release is set to Octopus (reads
from the parent image were already sparse-read ops). Using holes was
previously not very practical due to the large allocations sizes on
the OSD, but with the change to 4KiB minimum block sizes, such a
technique would be possible (albeit a breaking change for all older
clients controlled via a new feature bit). You also have the ability
to control the RBD object sizes and use something smaller than the
4MiB default.

> --
> Vitaliy Filippov
> _______________________________________________
> Dev mailing list -- dev@xxxxxxx
> To unsubscribe send an email to dev-leave@xxxxxxx
>

[1] https://github.com/ceph/ceph/pull/27999

-- 
Jason
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx