On Fri, Mar 20, 2020 at 1:29 PM <vitalif@xxxxxxxxxx> wrote: > > Hi. > > For a long time I was under an impression that clones are as efficient > in bluestore as snapshots. > > But today I finally decided to test it and ... I discovered it was an > utterly wrong impression :) RBD copies the whole 4 MB object even when a > small 4 KB block is modified within it in the child image. In my > all-NVMe cluster this leads to 40 (40!!!) random write iops (bs=4k > iodepth=1) in a fresh RBD clone, which is terrible. Anything with an iodepth of 1 is going to be (relatively) terrible on RBD. > Question of the day: is it possible to reimplement RBD clones using > "sparse objects"? As I understand the support for sparse objects > themselves is already there. So maybe librbd could only write the > modified part to the child image when writing and read "holes" from > parents when reading? The forthcoming Octopus release of librbd adds support for sparse copy-up writes [1] when your min OSD release is set to Octopus (reads from the parent image were already sparse-read ops). Using holes was previously not very practical due to the large allocations sizes on the OSD, but with the change to 4KiB minimum block sizes, such a technique would be possible (albeit a breaking change for all older clients controlled via a new feature bit). You also have the ability to control the RBD object sizes and use something smaller than the 4MiB default. > -- > Vitaliy Filippov > _______________________________________________ > Dev mailing list -- dev@xxxxxxx > To unsubscribe send an email to dev-leave@xxxxxxx > [1] https://github.com/ceph/ceph/pull/27999 -- Jason _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx