Ugh, trying this one more time. :(
I didn't see the original version of this message go out to ceph-devel
(possibly due to the attachment). I'll use pastebin instead:
https://pastebin.com/6txSkM7r
While debugging a bluestore issue I grabbed a wall-clock profile of a
client fio process while doing 4k random writes. This likely is not
associated with the other issues I'm trying to track down, but I wanted
to share since there's some locking going on that might be worth looking
at. FWIW, this is a single client issuing 32 concurrent IOs to a 3x rep
pool on a cluster with 16 OSDs on NVMe. The NVMe drives are nowhere
near saturated and the OSDs are semi-idle waiting for work. Top was
showing the fio process using about 300-320% CPU while the profiler
wasn't running and fio was doing somewhere around 14-15k IOPS.
1) 7 - tp_librbd, line 82
Lots of stuff going on here, but the big thing is all the time spent in
librbd::ImageCtx::write_to_cache. 70.2% of the total time in this
thread is spent in ObjectCacher::writex with lots of nested stuff, but
if you look all the way down on line 1293, another 11.8% of the time is
spent in Locker() and 1.5% of the time spent in ~Locker().
Of note is that ~12% of the time in this thread was spent in
Objecter::_calc_target. A bit less than half of that was crush.
2) 8 - fn-radosclient, line 1641
This thread is spending about half it's time in Context::complete, and
half it's time just waiting around. On line 1651 in
librbd::C_OrderedWrite::finish, we see a whopping 33.60% of the samples
spent in Locker doing Mutex::Lock. On line 1819, 3.15% spent in ~Locker
doing Mutex::Unlock.
Otherwise we see some of the usual suspects sprinkled around. bufferlist
appends, encode/decode, fast_dispatch, etc.
Mark
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html