On 02/26/2016 01:42 PM, Jan Schermer wrote:
RBD backend might be even worse, depending on how large dataset you try. One 4KB block can end up creating a 4MB object, and depending on how well hole-punching and fallocate works on your system you could in theory end up with a >1000 amplification if you always hit a different 4MB chunk (but that's not realistic).
Is that right?
Yes, the size hints rbd sends with writes will end up as an xfs ioctl
to ask for MIN(rbd object size, filestore_max_alloc_hint_size) (1MB for
the max by default) for writes to new objects.
Depending on how much the benchmark fills the image, this could be a
large or small overhead compared to the amount of data written.
Josh
Jan
On 26 Feb 2016, at 22:05, Josh Durgin <jdurgin@xxxxxxxxxx> wrote:
On 02/24/2016 07:10 PM, Christian Balzer wrote:
10 second rados bench with 4KB blocks, 219MB written in total.
nand-writes per SSD:41*32MB=1312MB.
10496MB total written to all SSDs.
Amplification:48!!!
Le ouch.
In my use case with rbd cache on all VMs I expect writes to be rather
large for the most part and not like this extreme example.
But as I wrote the last time I did this kind of testing, this is an area
where caveat emptor most definitely applies when planning and buying SSDs.
And where the Ceph code could probably do with some attention.
In this case it's likely rados bench using tiny objects that's
causing the massive overhead. rados bench is doing each write to a new
object, which ends up in a new file beneath the osd, with its own
xattrs too. For 4k writes, that's a ton of overhead.
fio with the rbd backend will give you a more realistic picture.
In jewel there will be --max-objects and --object-size options for
rados bench to get closer to an rbd-like workload as well.
Josh
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com