Re: Librbd performance issue

Yingxin Cheng <yingxincheng@xxxxxxxxx> · Tue, 26 Jun 2018 13:54:23 +0800

After looking into “ThreadPool::_lock” and the related implementations
in ImageRequestWQ, it turns out that the “lockdep” check in my vstart
environment is the major cause that slows the entire io-workqueue
down. So I disabled “lockdep” and did another round of experiments:
[1-2] With cache disabled, the internal io-worker performance can be
improved from IOPS 16000 to 31000, by adding op workers (from 1 to 8).
[3-4] With cache enabled, the internal worker performance is worse
(IOPS down 37.5% with 1 worker), and adding workers will further
decrease the performance (internal IOPS down 60.6% with 4 workers).

ImageRequestWQ itself is no longer a bottleneck, because the waiting
time of “ThreadPool::_lock” is not that significant, and I can't get
better performance numbers by removing blockers inside the lock.

I think the above results show two major improvement directions. a) A
better cache design that can be multi-threaded. b) Allow multiple
io-workers in librbd, and in potential this can bring up to 200% IOPS
improvement to trigger RADOS writes into OSDs.

[1-4] https://docs.google.com/document/d/1r8VJiTbs68X42Hncur48pPlZbL_yw8BTSdSKxBkrSPk/edit?usp=sharing

---------
I’m still looking forward to b) Allow multiple io-workers in librbd.
https://github.com/ceph/ceph/pull/20482 seems to implement a sane
destruction order when there are multiple workers.
If it doesn’t fix all race conditions, what are other scenarios? Is
there any existing error logs available, or any unit/integration tests
that I can refer to? I didn’t see any explicit failures during my
experiments with multiple worker configuration.

--Yingxin
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html