Hi,
I've just subscribe the mailing. I'm maybe breaking the thread as I
cannot "answer to all" ;o)
I'd like to share my research on understanding of this behavior.
A rados put is showing the expected behavior while the rados bench
doesn't even with a concurrency set to one.
As a new comer, I've been reading the code to understand the difference
between each "put" vs "bench" approach.
The first one is pretty straightforward and we achieve the IO via do_put
which call io_ctx.write{full}.
On the other hand, benchmark is using a much more complicated stuff by
using aio.
If I understand properly, that's mostly to be able to increase
concurrency. After a few calls we achieve the write_bench() function
which is the main loop of the benchmark
(https://github.com/ceph/ceph/blob/master/src/common/obj_bencher.cc#L302)
That's mostly where I have some troubles understand how it could works
as expected, here come why :
From this point,
https://github.com/ceph/ceph/blob/master/src/common/obj_bencher.cc#L330,
we do prepare objects as much as we do have concurrent_ios.
From this point,
https://github.com/ceph/ceph/blob/master/src/common/obj_bencher.cc#L344,
we do spread the IOs as much as we do have concurrent_ios
From this point,
https://github.com/ceph/ceph/blob/master/src/common/obj_bencher.cc#L368,
we do start the main loop until we reach the limit (time or amount of
objects)
Starting this loop,
https://github.com/ceph/ceph/blob/master/src/common/obj_bencher.cc#L371,
we do wait that all sent IOs (up to concurrent_ios) are completed. By
the way, I didn't understood how the end of IO is detected. AIO supports
callbacks, signals or polling. Which one is used ? I saw that we rely on
completion_is_done() which does a return completions[slot]->complete; I
only found something here but not sure if it's the good one :
https://github.com/ceph/ceph/blob/master/src/tools/rest_bench.cc#L329
Then we reach
https://github.com/ceph/ceph/blob/master/src/common/obj_bencher.cc#L389.
That's where I'm confused. as from my understanding we are rescheduling
_a single IO_ and get back to the waiting loop. So I don't really got
how the concurrency is kept.
To be more direct about my thoughts, I do think that somewhere the aio
stuff does ack the IO too soon and so we are sending a new IO while the
previous one didn't got complete. That would explain the kind of
behavior we do see with sebastien.
As a side note, I saw that ceph_clock_now is using gettimeofday which
is not resilient to system date changes (like if a ntp update is
occuring). clock_gettime with CLOCK_MONOTONIC is clearly prefered for
such "time difference computing" job.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com