Re: Observations with a SSD based pool under Hammer

Josh Durgin <jdurgin@xxxxxxxxxx> · Fri, 26 Feb 2016 15:18:54 -0800

On 02/26/2016 02:27 PM, Shinobu Kinjo wrote:
In this case it's likely rados bench using tiny objects that's
causing the massive overhead. rados bench is doing each write to a new
object, which ends up in a new file beneath the osd, with its own
xattrs too. For 4k writes, that's a ton of overhead.

That means that we don't see any proper results coming rados bench in this scenario (using very small objects), do we?
Or rados bench itself works as expected, but just 4k writes is problem?

It depends what workload you're trying to measure. If you want to
create new objects of a certain size, rados bench is perfect. Large 
writes are reasonably similar to rbd, but not exactly the same. For
small writes it's particularly different from the typical I/O pattern of
something like rbd.

I'm just curious about that because someone could misunderstand performance of the Ceph cluster because of the result in hammer.

In general I'd recommend using a tool more closely matching your actual
workload, or at least the interface used, e.g. fio with the rbd backend
will be more accurate for rbd than rados bench, cosbench will be better
for radosgw, etc.

Josh

Rgds,
Shinobu

----- Original Message -----
From: "Josh Durgin" <jdurgin@xxxxxxxxxx>
To: "Christian Balzer" <chibi@xxxxxxx>, ceph-users@xxxxxxxxxxxxxx
Sent: Saturday, February 27, 2016 6:05:07 AM
Subject: Re:  Observations with a SSD based pool under Hammer

On 02/24/2016 07:10 PM, Christian Balzer wrote:
10 second rados bench with 4KB blocks, 219MB written in total.
nand-writes per SSD:41*32MB=1312MB.
10496MB total written to all SSDs.
Amplification:48!!!

Le ouch.
In my use case with rbd cache on all VMs I expect writes to be rather
large for the most part and not like this extreme example.
But as I wrote the last time I did this kind of testing, this is an area
where caveat emptor most definitely applies when planning and buying SSDs.
And where the Ceph code could probably do with some attention.

In this case it's likely rados bench using tiny objects that's
causing the massive overhead. rados bench is doing each write to a new
object, which ends up in a new file beneath the osd, with its own
xattrs too. For 4k writes, that's a ton of overhead.

fio with the rbd backend will give you a more realistic picture.
In jewel there will be --max-objects and --object-size options for
rados bench to get closer to an rbd-like workload as well.

Josh
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com