Hi Cephers, I'm investigating the performance of datapath `librbd.aio_write` using distributed-tracing tech into the entire Ceph cluster. The basic idea is to calculate the internal request-throughputs from tracepoints in Ceph source code. The bottleneck can then be identified by finding out the code snippets where the throughput drops most significantly. And here is one identified related to ImageRequestWQ: In my test cluster (the latest dev code), I triggered 1000 rand-4K aio_write requests into librbd. The throughput of `ImageRequestWQ::queue()` reaches ~30000 IOPS. But the throughput of `ImageRequestWQ::_void_dequeue()` and the following `process()` drops significantly at only ~11000 IOPS [1]. This means the maximum internal consumption rate of rbd workers is ~11000 IOPS in this scenario, with default setting "rbd op thread = 1". So it must be the problem of "not enough workers". Then I tried to increase the number of workers to 8. However, the throughput of `_void_dequeue()` didn't increase. Instead, it drops to only ~3200 IOPS [2]. This implies there are too many resource contentions between multiple rbd op workers. I'm trying to figure out root causes of this problem. But firstly I want to ask is there any existing related progress from the community? Or is there any other helpful information that can help narrow down the root causes? [1-2] https://docs.google.com/document/d/1r8VJiTbs68X42Hncur48pPlZbL_yw8BTSdSKxBkrSPk/edit?usp=sharing Thanks! Yingxin -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html