2017-05-11 8:21 GMT+08:00 Mark Nelson <mnelson@xxxxxxxxxx>: > > > On 05/10/2017 06:24 PM, Mark Nelson wrote: >> >> >> >> On 05/10/2017 05:31 PM, Jason Dillaman wrote: >>> >>> On Wed, May 10, 2017 at 6:10 PM, Mark Nelson <mnelson@xxxxxxxxxx> wrote: >>>> >>>> 1) 7 - tp_librbd, line 82 >>>> >>>> Lots of stuff going on here, but the big thing is all the time spent in >>>> librbd::ImageCtx::write_to_cache. 70.2% of the total time in this >>>> thread is >>>> spent in ObjectCacher::writex with lots of nested stuff, but if you >>>> look all >>>> the way down on line 1293, another 11.8% of the time is spent in >>>> Locker() >>>> and 1.5% of the time spent in ~Locker(). >>> >>> >>> Yes -- the ObjectCacher is long overdue for a re-write since it's >>> single threaded. It looks like you were essentially performing >>> writethrough as well. I'd imagine you would just be better off >>> disabling the rbd cache when doing high-performance random write >>> workloads since you are going to get zero benefit from the cache with >>> that workload -- at least that's what I usually recommend. >>> >> >> Often I do turn rbd cache off for bluestore testing. This was an older >> conf file where I inadvertently hadn't disabled it. Still, it's an >> unfortunate choice that has to be made, potentially by someone other >> than the user running the workload. :/ > > > Yep, disabling rbd cache bumped 4K write IOPS from ~14K to ~31K, and closer > ~36-37K with a higher IO depth at around 410% CPU usage. > > trace without rbd cache here: > > https://pastebin.com/t8FFsWNb > > looking a lot better, though thread 5 (tp_librbd) is still pegging. Still a > little bit of locking in various places. a bit more time shifted into > _calc_target. > > async msgr is as expected a lot busier than it used to be. > So it seems calc pg mapping is an expensive operation. And furthermore, if we always need to retry the loop because of collision and rejection. What about using a mapping cache table to look up the pg mapping directly once it is calculated? I think We can re-calucate the pg mapping until the osdmap epoch is changed, otherwise, pg mapping should be consistent during one osdmap. Regards Ning Yao -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html