Fwd: Re: [ceph-users] ceph osd commit latency increase over time, until restart

Igor Fedotov <ifedotov@xxxxxxx> · Fri, 1 Mar 2019 22:24:34 +0300

Xiaoxi,

Here is the luminous patch which performs StupidAllocator reset once per 
12 hours.

https://github.com/ceph/ceph/tree/wip-ifed-reset-allocator-luminous

Sorry, didn't have enough time today to learn how to make a package from 
it, just sources for now.

Thanks,

Igor

On 3/1/2019 11:46 AM, Xiaoxi Chen wrote:
igor，
   I can test the patch if we have a package.
   My enviroment and workload can consistently reproduce the latency  
2-3 days after restarting.
    Sage tells me to try bitmap allocator to make sure stupid 
allocator is the bad guy. I have some osds in luminous +bitmap and 
some osds in 14.1.0+bitmap. Both looks positive till now, but i need 
more time to be sure.
     The perf ,log and admin socket analysis lead to the theory that 
in alloc_int the loop sometimes take long time wkth allocator locks 
held. Which blocks release part called from _txc_finish in 
kv_finalize_thread, this thread is also the one to calculate 
state_kv_committing_lat and overall commit_lat. You can find from 
admin socket that state_done_latency has similar trend as commit_latency.
    But we cannot find a theory to.explain why reboot helps, the 
allocator btree will be rebuild from freelist manager and.it.should be 
exactly. the same as it is prior to reboot.   Anything related with pg 
recovery?

   Anyway, as I have a live env and workload, I am more than willing 
to work with you for further investigatiom

-Xiaoxi

Igor Fedotov <ifedotov@xxxxxxx <mailto:ifedotov@xxxxxxx>> 于 
2019年3月1日周五 上午6:21写道：

    Also I think it makes sense to create a ticket at this point. Any
    volunteers?

    On 3/1/2019 1:00 AM, Igor Fedotov wrote:
    > Wondering if somebody would be able to apply simple patch that
    > periodically resets StupidAllocator?
    >
    > Just to verify/disprove the hypothesis it's allocator relateted
    >
    > On 2/28/2019 11:57 PM, Stefan Kooman wrote:
    >> Quoting Wido den Hollander (wido@xxxxxxxx <mailto:wido@xxxxxxxx>):
    >>> Just wanted to chime in, I've seen this with
    Luminous+BlueStore+NVMe
    >>> OSDs as well. Over time their latency increased until we
    started to
    >>> notice I/O-wait inside VMs.
    >> On a Luminous 12.2.8 cluster with only SSDs we also hit this
    issue I
    >> guess. After restarting the OSD servers the latency would drop
    to normal
    >> values again. See https://owncloud.kooman.org/s/BpkUc7YM79vhcDj
    >>
    >> Reboots were finished at ~ 19:00.
    >>
    >> Gr. Stefan
    >>