Re: RADOS Bench strange behavior

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I've just subscribe the mailing. I'm maybe breaking the thread as I cannot "answer to all" ;o)

I'd like to share my research on understanding of this behavior.

A rados put is showing the expected behavior while the rados bench doesn't even with a concurrency set to one.

As a new comer, I've been reading the code to understand the difference between each "put" vs "bench" approach.

The first one is pretty straightforward and we achieve the IO via do_put which call io_ctx.write{full}.

On the other hand, benchmark is using a much more complicated stuff by using aio. If I understand properly, that's mostly to be able to increase concurrency. After a few calls we achieve the write_bench() function which is the main loop of the benchmark (https://github.com/ceph/ceph/blob/master/src/common/obj_bencher.cc#L302)

That's mostly where I have some troubles understand how it could works as expected, here come why :

From this point, https://github.com/ceph/ceph/blob/master/src/common/obj_bencher.cc#L330, we do prepare objects as much as we do have concurrent_ios.

From this point, https://github.com/ceph/ceph/blob/master/src/common/obj_bencher.cc#L344, we do spread the IOs as much as we do have concurrent_ios

From this point, https://github.com/ceph/ceph/blob/master/src/common/obj_bencher.cc#L368, we do start the main loop until we reach the limit (time or amount of objects)

Starting this loop, https://github.com/ceph/ceph/blob/master/src/common/obj_bencher.cc#L371, we do wait that all sent IOs (up to concurrent_ios) are completed. By the way, I didn't understood how the end of IO is detected. AIO supports callbacks, signals or polling. Which one is used ? I saw that we rely on completion_is_done() which does a return completions[slot]->complete; I only found something here but not sure if it's the good one : https://github.com/ceph/ceph/blob/master/src/tools/rest_bench.cc#L329

Then we reach https://github.com/ceph/ceph/blob/master/src/common/obj_bencher.cc#L389. That's where I'm confused. as from my understanding we are rescheduling _a single IO_ and get back to the waiting loop. So I don't really got how the concurrency is kept.

To be more direct about my thoughts, I do think that somewhere the aio stuff does ack the IO too soon and so we are sending a new IO while the previous one didn't got complete. That would explain the kind of behavior we do see with sebastien.

As a side note, I saw that ceph_clock_now is using gettimeofday which is not resilient to system date changes (like if a ntp update is occuring). clock_gettime with CLOCK_MONOTONIC is clearly prefered for such "time difference computing" job.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux