RE: Librbd performance issue

"Chamarthy, Mahati" <mahati.chamarthy@xxxxxxxxx> · Mon, 4 Jun 2018 04:22:41 +0000

Hi Yingxin,

I had done profiling on librbd/librados, which is also ongoing. But I'm currently experimenting how AIO completions could better be dispatched, other than in its current form. Feel free to ask any specific questions. Looking deeper into other areas Jason mentioned would be a great start to narrow down bottlenecks.

Mahati

-----Original Message-----
From: ceph-devel-owner@xxxxxxxxxxxxxxx <ceph-devel-owner@xxxxxxxxxxxxxxx> On Behalf Of Jason Dillaman
Sent: Friday, June 1, 2018 5:51 PM
To: Yingxin Cheng <yingxincheng@xxxxxxxxx>
Cc: Ceph Development <ceph-devel@xxxxxxxxxxxxxxx>; Chamarthy, Mahati <mahati.chamarthy@xxxxxxxxx>
Subject: Re: Librbd performance issue

On Fri, Jun 1, 2018 at 4:20 AM, Yingxin Cheng <mailto:yingxincheng@xxxxxxxxx> wrote:
> Hi Cephers,
>
> I'm investigating the performance of datapath `librbd.aio_write` using 
> distributed-tracing tech into the entire Ceph cluster. The basic idea 
> is to calculate the internal request-throughputs from tracepoints in 
> Ceph source code. The bottleneck can then be identified by finding out 
> the code snippets where the throughput drops most significantly. And 
> here is one identified related to ImageRequestWQ:
>
> In my test cluster (the latest dev code), I triggered 1000 rand-4K 
> aio_write requests into librbd. The throughput of 
> `ImageRequestWQ::queue()` reaches ~30000 IOPS. But the throughput of 
> `ImageRequestWQ::_void_dequeue()` and the following `process()` drops 
> significantly at only ~11000 IOPS [1]. This means the maximum internal 
> consumption rate of rbd workers is ~11000 IOPS in this scenario, with 
> default setting "rbd op thread = 1".
>
> So it must be the problem of "not enough workers". Then I tried to 
> increase the number of workers to 8. However, the throughput of 
> `_void_dequeue()` didn't increase. Instead, it drops to only ~3200 
> IOPS [2]. This implies there are too many resource contentions between 
> multiple rbd op workers.
>
> I'm trying to figure out root causes of this problem. But firstly I 
> want to ask is there any existing related progress from the community?
> Or is there any other helpful information that can help narrow down 
> the root causes?

Have you disabled the librbd in-memory cache during your tests? The cache has a giant global lock that causes plenty of thread contention.
The next known spot for thread contention is does in librados since each per-OSD session has a lock, so the fewer OSDs you have, the higher the probability for IO contention.  Finally, within librados, all AIO completions are fired from a single thread -- so even if you are pumping data to the OSDs using 8 threads, you are only getting serialized completions.

Just prior to Cephalocon I had created a test branch which switched the librados AIO completions to the fast-dispatcher path, which had a noticeable improvement in latency. Mahati (CCed) is also investigating librbd/librados performance.

> [1-2] 
> https://docs.google.com/document/d/1r8VJiTbs68X42Hncur48pPlZbL_yw8BTSd
> SKxBkrSPk/edit?usp=sharing
>
>
> Thanks!
> Yingxin
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" 
> in the body of a message to mailto:majordomo@xxxxxxxxxxxxxxx More majordomo 
> info at  http://vger.kernel.org/majordomo-info.html

--
Jason
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to mailto:majordomo@xxxxxxxxxxxxxxx More majordomo info at  http://vger.kernel.org/majordomo-info.html
��.n��������+%������w��{.n����z��u���ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f