<sorry, forgot to add ceph-devel and Mahati> 2018-06-01 20:21 GMT+08:00 Jason Dillaman <jdillama@xxxxxxxxxx>: > On Fri, Jun 1, 2018 at 4:20 AM, Yingxin Cheng <yingxincheng@xxxxxxxxx> wrote: > > Have you disabled the librbd in-memory cache during your tests? The > cache has a giant global lock that causes plenty of thread contention. I actually examined 3 giant global locks when writethrough cache is enabled: a) librbd::io::ObjectDispatcher::mlock wait avg: 1.15us (1 worker) -> 8.58us (8 workers) b) librbd::io::ImageCtx::snap_lock wait avg: 1.14us (1 worker) -> 7.09us (8 workers) c) librbd::cache::ObjectCacherObjectDispatch::m_cache_lock wait avg: 0.75us (1 worker) -> 1385us (8 workers) I think it's because the critical area of m_cache_lock is too huge, it prevents the expensive "writex" operations from concurrent execution. === Surprisingly, even after the cache is disabled, the worker contentions are still there. (see the updated graphs in https://docs.google.com/document/d/1r8VJiTbs68X42Hncur48pPlZbL_yw8BTSdSKxBkrSPk/edit?usp=sharing). So I turned to another global lock "ThreadPool::_lock". According to the implementation in of `ThreadPool::worker(WorkThread *wt)`, each of the librbd worker threads has to do 4 types of things related to this lock: a) execute inside the critical area of _lock, including `_void_dequeue` the item; b) execute outside the critical area to `process()` the dequeued item; c) wait for entering the _lock after the `process()` completed; d) sleep and try to re-enter the _lock when workqueue is empty It is guaranteed that a) + b) + c) + d) account for the 100% of the worker's lifecycle. And here are the results when cache is disabled: 1 worker, 2 workers, 4 workers, 10 workers a) 11.14%, 13.99%, 15.38%, 06.68% b) 83.83%, 79.11%, 66.60%, 21.44% c) 05.03%, 06.89%, 15.40%, 17.59% d) 00.00%, 00.00%, 02.63%, 54.29% Further, the absolute locked time of "ThreadPool::_lock" takes 11.14%, 27.98%, 61.52%, 66.84% of the total recorded period when there are 1/2/4/10 workers. I think it implies that this implementation also needs to be improved. > The next known spot for thread contention is does in librados since > each per-OSD session has a lock, so the fewer OSDs you have, the > higher the probability for IO contention. I have 3 OSDs in this environment. > Finally, within librados, > all AIO completions are fired from a single thread -- so even if you > are pumping data to the OSDs using 8 threads, you are only getting > serialized completions. > > Just prior to Cephalocon I had created a test branch which switched > the librados AIO completions to the fast-dispatcher path, which had a > noticeable improvement in latency. Mahati (CCed) is also investigating > librbd/librados performance. > > > > > -- > Jason I also have a question when tried to enable multiple RBD workers. What's the status of http://tracker.ceph.com/issues/17379? Is it still on-going? --Yingxin -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html