Re: RADOS Bench strange behavior

Erwan Velu <erwan@xxxxxxxxxxxx> · Wed, 10 Jul 2013 09:38:12 +0200

Hi,

I've just subscribe the mailing. I'm maybe breaking the thread as I 
cannot "answer to all" ;o)

I'd like to share my research on understanding of this behavior.

A rados put is showing the expected behavior while the rados bench 
doesn't even with a concurrency set to one.

As a new comer, I've been reading the code to understand the difference 
between each "put" vs "bench" approach.

The first one is pretty straightforward and we achieve the IO via do_put 
which call io_ctx.write{full}.

On the other hand, benchmark is using a much more complicated stuff by 
using aio.
If I understand properly, that's mostly to be able to increase 
concurrency. After a few calls we achieve the write_bench() function 
which is the main loop of the benchmark 
(https://github.com/ceph/ceph/blob/master/src/common/obj_bencher.cc#L302)

That's mostly where I have some troubles understand how it could works 
as expected, here come why :

From this point, 
https://github.com/ceph/ceph/blob/master/src/common/obj_bencher.cc#L330, 
we do prepare objects as much as we do have concurrent_ios.

From this point, 
https://github.com/ceph/ceph/blob/master/src/common/obj_bencher.cc#L344, 
we do spread the IOs as much as we do have concurrent_ios

From this point, 
https://github.com/ceph/ceph/blob/master/src/common/obj_bencher.cc#L368, 
we do start the main loop until we reach the limit (time or amount of 
objects)

Starting this loop, 
https://github.com/ceph/ceph/blob/master/src/common/obj_bencher.cc#L371, 
we do wait that all sent IOs (up to concurrent_ios) are completed. By 
the way, I didn't understood how the end of IO is detected. AIO supports 
callbacks, signals or polling. Which one is used ? I saw that we rely on 
completion_is_done() which does a  return completions[slot]->complete; I 
only found something here but not sure if it's the good one : 
https://github.com/ceph/ceph/blob/master/src/tools/rest_bench.cc#L329

Then we reach 
https://github.com/ceph/ceph/blob/master/src/common/obj_bencher.cc#L389.
That's where I'm confused. as from my understanding we are rescheduling 
_a single IO_ and get back to the waiting loop. So I don't really got 
how the concurrency is kept.

To be more direct about my thoughts, I do think that somewhere the aio 
stuff does ack the IO too soon and so we are sending a new IO while the 
previous one didn't got complete. That would explain the kind of 
behavior we do see with sebastien.

As a side note, I saw that  ceph_clock_now is using gettimeofday which 
is not resilient to system date changes (like if a ntp update is 
occuring). clock_gettime with CLOCK_MONOTONIC is clearly prefered for 
such "time difference computing" job.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com