Re: ceph osd tell bench

Travis Rhoden <trhoden@xxxxxxxxx> · Fri, 3 May 2013 13:43:43 -0400

Thanks Gregory, that's perfect.  Just the clarification I needed.

I was having the same thought, that if I increased the data to be written to sufficient size, I would hit the backing store.  But even then, since we just get one number from the  osd tell bench, it would really be some sort of funky average of the two.  Like you said, it returns when it is "safe", and that will happen once the last bit has been committed to the journal, not flushed.  (I believe my thinking there is correct).  Not a complaint or an issue.  Just like to make sure I understand things correctly.

Thanks again.

On Fri, May 3, 2013 at 1:21 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:

On Fri, May 3, 2013 at 7:34 AM, Travis Rhoden <trhoden@xxxxxxxxx> wrote:

> I have a question about "tell bench" command.

>

> When I run this, is it behaving more or less like a dd on the drive?  It

> appears to be, but I wanted to confirm whether or not it is bypassing all

> the normal Ceph stack that would be writing metadata, calculating checksums,

> etc.

>

> One bit of behavior I noticed a while back that I was not expecting is that

> this command does write to the journal.  It made sense when I thought about

> it, but when I have an SSD journal in front of an OSD, I can't get the "tell

> bench" command to really show me accurate numbers of the raw speed of the

> OSD -- instead I get write speeds of the SSD.  Just a small caveat there.

>

> The upside to that is when do you something like "tell \* bench", you are

> able to see if that SSD becomes a bottleneck by hosting multiple journals,

> so I'm not really complaining.  But it does make a bit tough to see if

> perhaps one OSD is performing much differently than others.

>

> But really, I'm mainly curious if it skips any normal metadata/checksum

> overhead that may be there otherwise.

The way this is implemented, it writes data via the FileStore in a

given chunk size. I believe the defaults are 1GB of data and 4MB, but

you can set this: "ceph osd tell <id> bench <data_size> <block_size>"

(IIRC). By going through the FileStore it maintains much of the same

workload as an incoming client request would (so it reports as

complete at the same time it would return a "safe" response to a

client, for instance, and does write to the journal), but it does

leave some stuff out:

1) The OSD runs CRCs on the data in incoming messages; here the data

is generated locally so of course this doesn't happen.

2) Normal writes require updating PG metadata (not included here),

which adds generally one write that's not included here.

If you increase the amount of data written in the bench to exceed the

journal by some reasonable amount, you should be able to test your

backing store throughput and not just your journal. :)

-Greg

Software Engineer #42 @ http://inktank.com | http://ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com