Re: Librbd performance issue

Jason Dillaman <jdillama@xxxxxxxxxx> · Fri, 1 Jun 2018 08:21:09 -0400

On Fri, Jun 1, 2018 at 4:20 AM, Yingxin Cheng <yingxincheng@xxxxxxxxx> wrote:
> Hi Cephers,
>
> I'm investigating the performance of datapath `librbd.aio_write` using
> distributed-tracing tech into the entire Ceph cluster. The basic idea
> is to calculate the internal request-throughputs from tracepoints in
> Ceph source code. The bottleneck can then be identified by finding out
> the code snippets where the throughput drops most significantly. And
> here is one identified related to ImageRequestWQ:
>
> In my test cluster (the latest dev code), I triggered 1000 rand-4K
> aio_write requests into librbd. The throughput of
> `ImageRequestWQ::queue()` reaches ~30000 IOPS. But the throughput of
> `ImageRequestWQ::_void_dequeue()` and the following `process()` drops
> significantly at only ~11000 IOPS [1]. This means the maximum internal
> consumption rate of rbd workers is ~11000 IOPS in this scenario, with
> default setting "rbd op thread = 1".
>
> So it must be the problem of "not enough workers". Then I tried to
> increase the number of workers to 8. However, the throughput of
> `_void_dequeue()` didn't increase. Instead, it drops to only ~3200
> IOPS [2]. This implies there are too many resource contentions between
> multiple rbd op workers.
>
> I'm trying to figure out root causes of this problem. But firstly I
> want to ask is there any existing related progress from the community?
> Or is there any other helpful information that can help narrow down
> the root causes?

Have you disabled the librbd in-memory cache during your tests? The
cache has a giant global lock that causes plenty of thread contention.
The next known spot for thread contention is does in librados since
each per-OSD session has a lock, so the fewer OSDs you have, the
higher the probability for IO contention.  Finally, within librados,
all AIO completions are fired from a single thread -- so even if you
are pumping data to the OSDs using 8 threads, you are only getting
serialized completions.

Just prior to Cephalocon I had created a test branch which switched
the librados AIO completions to the fast-dispatcher path, which had a
noticeable improvement in latency. Mahati (CCed) is also investigating
librbd/librados performance.

> [1-2] https://docs.google.com/document/d/1r8VJiTbs68X42Hncur48pPlZbL_yw8BTSdSKxBkrSPk/edit?usp=sharing
>
>
> Thanks!
> Yingxin
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Jason
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html