Re: RBD client wallclock profile during 4k random writes

jacky ding <jackyding2679@xxxxxxxxx> · Fri, 26 May 2017 13:32:05 +0800

What's the tool that you use to trace those threads,as Mark Nelson
shows in the link,i think it's pretty cool and useful.Thanks.

2017-05-19 0:33 GMT+08:00 Ning Yao <zay11022@xxxxxxxxx>:
> 2017-05-11 8:21 GMT+08:00 Mark Nelson <mnelson@xxxxxxxxxx>:
>>
>>
>> On 05/10/2017 06:24 PM, Mark Nelson wrote:
>>>
>>>
>>>
>>> On 05/10/2017 05:31 PM, Jason Dillaman wrote:
>>>>
>>>> On Wed, May 10, 2017 at 6:10 PM, Mark Nelson <mnelson@xxxxxxxxxx> wrote:
>>>>>
>>>>> 1) 7 - tp_librbd, line 82
>>>>>
>>>>> Lots of stuff going on here, but the big thing is all the time spent in
>>>>> librbd::ImageCtx::write_to_cache.  70.2% of the total time in this
>>>>> thread is
>>>>> spent in ObjectCacher::writex with lots of nested stuff, but if you
>>>>> look all
>>>>> the way down on line 1293, another 11.8% of the time is spent in
>>>>> Locker()
>>>>> and 1.5% of the time spent in ~Locker().
>>>>
>>>>
>>>> Yes -- the ObjectCacher is long overdue for a re-write since it's
>>>> single threaded. It looks like you were essentially performing
>>>> writethrough as well. I'd imagine you would just be better off
>>>> disabling the rbd cache when doing high-performance random write
>>>> workloads since you are going to get zero benefit from the cache with
>>>> that workload -- at least that's what I usually recommend.
>>>>
>>>
>>> Often I do turn rbd cache off for bluestore testing.  This was an older
>>> conf file where I inadvertently hadn't disabled it.  Still, it's an
>>> unfortunate choice that has to be made, potentially by someone other
>>> than the user running the workload. :/
>>
>>
>> Yep, disabling rbd cache bumped 4K write IOPS from ~14K to ~31K, and closer
>> ~36-37K with a higher IO depth at around 410% CPU usage.
>>
>> trace without rbd cache here:
>>
>> https://pastebin.com/t8FFsWNb
>>
>> looking a lot better, though thread 5 (tp_librbd) is still pegging. Still a
>> little bit of locking in various places.  a bit more time shifted into
>> _calc_target.
>>
>> async msgr is as expected a lot busier than it used to be.
>>
>
> So it seems calc pg mapping is an expensive operation. And
> furthermore, if we always need to retry the loop because of collision
> and rejection.
>
> What about using a mapping cache table to look up the pg mapping
> directly once it is calculated? I think We can re-calucate the pg
> mapping until the osdmap epoch is changed, otherwise, pg mapping
> should be consistent during one osdmap.
>
> Regards
> Ning Yao
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html