On Fri, Jun 1, 2018 at 4:20 AM, Yingxin Cheng <yingxincheng@xxxxxxxxx> wrote: > Hi Cephers, > > I'm investigating the performance of datapath `librbd.aio_write` using > distributed-tracing tech into the entire Ceph cluster. The basic idea > is to calculate the internal request-throughputs from tracepoints in > Ceph source code. The bottleneck can then be identified by finding out > the code snippets where the throughput drops most significantly. And > here is one identified related to ImageRequestWQ: > > In my test cluster (the latest dev code), I triggered 1000 rand-4K > aio_write requests into librbd. The throughput of > `ImageRequestWQ::queue()` reaches ~30000 IOPS. But the throughput of > `ImageRequestWQ::_void_dequeue()` and the following `process()` drops > significantly at only ~11000 IOPS [1]. This means the maximum internal > consumption rate of rbd workers is ~11000 IOPS in this scenario, with > default setting "rbd op thread = 1". > > So it must be the problem of "not enough workers". Then I tried to > increase the number of workers to 8. However, the throughput of > `_void_dequeue()` didn't increase. Instead, it drops to only ~3200 > IOPS [2]. This implies there are too many resource contentions between > multiple rbd op workers. > > I'm trying to figure out root causes of this problem. But firstly I > want to ask is there any existing related progress from the community? > Or is there any other helpful information that can help narrow down > the root causes? Have you disabled the librbd in-memory cache during your tests? The cache has a giant global lock that causes plenty of thread contention. The next known spot for thread contention is does in librados since each per-OSD session has a lock, so the fewer OSDs you have, the higher the probability for IO contention. Finally, within librados, all AIO completions are fired from a single thread -- so even if you are pumping data to the OSDs using 8 threads, you are only getting serialized completions. Just prior to Cephalocon I had created a test branch which switched the librados AIO completions to the fast-dispatcher path, which had a noticeable improvement in latency. Mahati (CCed) is also investigating librbd/librados performance. > [1-2] https://docs.google.com/document/d/1r8VJiTbs68X42Hncur48pPlZbL_yw8BTSdSKxBkrSPk/edit?usp=sharing > > > Thanks! > Yingxin > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Jason -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html