Re: Observations with a SSD based pool under Hammer

Shinobu Kinjo <skinjo@xxxxxxxxxx> · Fri, 26 Feb 2016 18:17:05 -0500 (EST)

Thanks!

In jewel, as you mentioned, there will be "--max-objects" and "--object-size" options.
That hint will go away or mitigate /w those options. Collect?

Are those options available in:

# ceph -v
ceph version 10.0.2 (86764eaebe1eda943c59d7d784b893ec8b0c6ff9)??

Rgds,
Shinobu

----- Original Message -----
From: "Josh Durgin" <jdurgin@xxxxxxxxxx>
To: "Jan Schermer" <jan@xxxxxxxxxxx>
Cc: ceph-users@xxxxxxxxxxxxxx
Sent: Saturday, February 27, 2016 7:57:44 AM
Subject: Re:  Observations with a SSD based pool under Hammer

On 02/26/2016 01:42 PM, Jan Schermer wrote:
> RBD backend might be even worse, depending on how large dataset you try. One 4KB block can end up creating a 4MB object, and depending on how well hole-punching and fallocate works on your system you could in theory end up with a >1000 amplification if you always hit a different 4MB chunk (but that's not realistic).
> Is that right?

Yes, the size hints rbd sends with writes will end up as an xfs ioctl
to ask for MIN(rbd object size, filestore_max_alloc_hint_size) (1MB for
the max by default) for writes to new objects.

Depending on how much the benchmark fills the image, this could be a
large or small overhead compared to the amount of data written.

Josh

> Jan
>
>> On 26 Feb 2016, at 22:05, Josh Durgin <jdurgin@xxxxxxxxxx> wrote:
>>
>> On 02/24/2016 07:10 PM, Christian Balzer wrote:
>>> 10 second rados bench with 4KB blocks, 219MB written in total.
>>> nand-writes per SSD:41*32MB=1312MB.
>>> 10496MB total written to all SSDs.
>>> Amplification:48!!!
>>>
>>> Le ouch.
>>> In my use case with rbd cache on all VMs I expect writes to be rather
>>> large for the most part and not like this extreme example.
>>> But as I wrote the last time I did this kind of testing, this is an area
>>> where caveat emptor most definitely applies when planning and buying SSDs.
>>> And where the Ceph code could probably do with some attention.
>>
>> In this case it's likely rados bench using tiny objects that's
>> causing the massive overhead. rados bench is doing each write to a new
>> object, which ends up in a new file beneath the osd, with its own
>> xattrs too. For 4k writes, that's a ton of overhead.
>>
>> fio with the rbd backend will give you a more realistic picture.
>> In jewel there will be --max-objects and --object-size options for
>> rados bench to get closer to an rbd-like workload as well.
>>
>> Josh

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com