Re: ceph osd commit latency increase over time, until restart

Adam Kupczyk <akupczyk@xxxxxxxxxx> · Tue, 12 Feb 2019 10:24:37 +0100



    Hi Chen,
    thanks for the update. Will prepare patch to periodically reset
      StupidAllocator today.

    
    And just to let you know below is an e-mail from AdamK from RH
      which might explain the issue with the allocator.
    Also please note that StupidAllocator might not perform full
      defragmentation in run-time. That's why we observed (mentioned
      somewhere in the thread) fragmentation growth while OSD is running
      and its drop on restart. Such a restart rebuilds internal tree and
      eliminates defragmentation flaws. May be that's the case.

    
    Thanks,
    Igor

    
    -------- Forwarded Message --------
    
      
          Subject: 
          High CPU in StupidAllocator
        
        
          Date: 
          Tue, 12 Feb 2019 10:24:37 +0100
        
        
          From: 
          Adam Kupczyk <akupczyk@xxxxxxxxxx>
        
        
          To: 
          IGOR FEDOTOV <ifed75@xxxxxxxxx>
        
      
    Hi Igor,
    

    I have observed that StupidAllocator can burn a lot of CPU in
      StupidAllocator::allocate_int().

    
    This comes from loops:

      while (p != free[bin].end()) {

          if (_aligned_len(p, alloc_unit) >= want_size) {

            goto found;

          }

          ++p;

      }

    
    It happens when want_size is close to limit of size of bin.

    
    For example, free[5] contains sizes 8192..16383.

    
    When requesting size like 16000 it is quite likely that
      multiple chunks must be checked.

      
    I have made an attempt to improve it by increasing amount of
      buckets.

    
    It is done in aclamk/wip-bs-stupid-allocator-2 .
    

    Best regards, 

    
    Adam Kupczyk
    

    On 3/1/2019 11:46 AM, Xiaoxi Chen
      wrote:

    
        igor，
             I can test the patch if we have a package.
             My enviroment and workload can consistently
            reproduce the latency  2-3 days after restarting.
              Sage tells me to try bitmap allocator to
            make sure stupid allocator is the bad guy. I have some osds
            in luminous +bitmap and some osds in 14.1.0+bitmap.  Both
            looks positive till now, but i need more time to be sure.
               The perf ,log and admin socket analysis
            lead to the theory that in alloc_int the loop sometimes take
            long time wkth allocator locks held. Which blocks release
            part called from _txc_finish in kv_finalize_thread, this
            thread is also the one to calculate state_kv_committing_lat
            and overall commit_lat. You can find from admin socket that
            state_done_latency has similar trend as commit_latency.
              But we cannot find a theory to.explain why reboot helps,
          the allocator btree will be rebuild from freelist manager
          and.it.should be exactly. the same as it is prior to reboot. 
           Anything related with pg recovery?
        

           Anyway, as I have a live env and workload, I
          am more than willing to work with you for further
          investigatiom
        

        -Xiaoxi
        

          Igor Fedotov <ifedotov@xxxxxxx>
            于 2019年3月1日周五 上午6:21写道：

          
          Also I
            think it makes sense to create a ticket at this point. Any 

            volunteers?

            
            On 3/1/2019 1:00 AM, Igor Fedotov wrote:

            > Wondering if somebody would be able to apply simple
            patch that 

            > periodically resets StupidAllocator?

            >

            > Just to verify/disprove the hypothesis it's allocator
            relateted

            >

            > On 2/28/2019 11:57 PM, Stefan Kooman wrote:

            >> Quoting Wido den Hollander (wido@xxxxxxxx):

            >>> Just wanted to chime in, I've seen this with
            Luminous+BlueStore+NVMe

            >>> OSDs as well. Over time their latency increased
            until we started to

            >>> notice I/O-wait inside VMs.

            >> On a Luminous 12.2.8 cluster with only SSDs we also
            hit this issue I

            >> guess. After restarting the OSD servers the latency
            would drop to normal

            >> values again. See https://owncloud.kooman.org/s/BpkUc7YM79vhcDj

            >>

            >> Reboots were finished at ~ 19:00.

            >>

            >> Gr. Stefan

            >>

          
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com