Re: Observations with a SSD based pool under Hammer

Jan Schermer <jan@xxxxxxxxxxx> · Fri, 26 Feb 2016 22:42:11 +0100

RBD backend might be even worse, depending on how large dataset you try. One 4KB block can end up creating a 4MB object, and depending on how well hole-punching and fallocate works on your system you could in theory end up with a >1000 amplification if you always hit a different 4MB chunk (but that's not realistic).
Is that right?

Jan

> On 26 Feb 2016, at 22:05, Josh Durgin <jdurgin@xxxxxxxxxx> wrote:
> 
> On 02/24/2016 07:10 PM, Christian Balzer wrote:
>> 10 second rados bench with 4KB blocks, 219MB written in total.
>> nand-writes per SSD:41*32MB=1312MB.
>> 10496MB total written to all SSDs.
>> Amplification:48!!!
>> 
>> Le ouch.
>> In my use case with rbd cache on all VMs I expect writes to be rather
>> large for the most part and not like this extreme example.
>> But as I wrote the last time I did this kind of testing, this is an area
>> where caveat emptor most definitely applies when planning and buying SSDs.
>> And where the Ceph code could probably do with some attention.
> 
> In this case it's likely rados bench using tiny objects that's
> causing the massive overhead. rados bench is doing each write to a new
> object, which ends up in a new file beneath the osd, with its own
> xattrs too. For 4k writes, that's a ton of overhead.
> 
> fio with the rbd backend will give you a more realistic picture.
> In jewel there will be --max-objects and --object-size options for
> rados bench to get closer to an rbd-like workload as well.
> 
> Josh
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com