Re: ceph osd tell bench

Gregory Farnum <greg@xxxxxxxxxxx> · Fri, 3 May 2013 10:21:02 -0700

On Fri, May 3, 2013 at 7:34 AM, Travis Rhoden <trhoden@xxxxxxxxx> wrote:
> I have a question about "tell bench" command.
>
> When I run this, is it behaving more or less like a dd on the drive?  It
> appears to be, but I wanted to confirm whether or not it is bypassing all
> the normal Ceph stack that would be writing metadata, calculating checksums,
> etc.
>
> One bit of behavior I noticed a while back that I was not expecting is that
> this command does write to the journal.  It made sense when I thought about
> it, but when I have an SSD journal in front of an OSD, I can't get the "tell
> bench" command to really show me accurate numbers of the raw speed of the
> OSD -- instead I get write speeds of the SSD.  Just a small caveat there.
>
> The upside to that is when do you something like "tell \* bench", you are
> able to see if that SSD becomes a bottleneck by hosting multiple journals,
> so I'm not really complaining.  But it does make a bit tough to see if
> perhaps one OSD is performing much differently than others.
>
> But really, I'm mainly curious if it skips any normal metadata/checksum
> overhead that may be there otherwise.

The way this is implemented, it writes data via the FileStore in a
given chunk size. I believe the defaults are 1GB of data and 4MB, but
you can set this: "ceph osd tell <id> bench <data_size> <block_size>"
(IIRC). By going through the FileStore it maintains much of the same
workload as an incoming client request would (so it reports as
complete at the same time it would return a "safe" response to a
client, for instance, and does write to the journal), but it does
leave some stuff out:
1) The OSD runs CRCs on the data in incoming messages; here the data
is generated locally so of course this doesn't happen.
2) Normal writes require updating PG metadata (not included here),
which adds generally one write that's not included here.

If you increase the amount of data written in the bench to exceed the
journal by some reasonable amount, you should be able to test your
backing store throughput and not just your journal. :)
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com