Re: snapshot removal slows cluster

Peter Maloney <peter.maloney@xxxxxxxxxxxxxxxxxxxx> · Wed, 26 Apr 2017 15:51:55 +0200



    On 04/26/17 14:54, Vladimir Prokofev
      wrote:

    
        Hello ceph-users.
        

        Short description: during snapshot removal osd usilisation goes
        up to 100%, which leads to slow requests and VM failures due to
        IOPS stall.
        

        We're using Openstack Cinder with CEPH cluster as a volume
          backend. CEPH version is 10.2.6.
        We also using cinder-backup to create backups of those
          volumes in CEPH, which uses snapshot and layering features I
          guess.
        Cluster consists of 5 OSD nodes with mixed SSD/HDD storage,
          separate SSD for HDD journals, separate 10Gb/s public and
          private networks, 3 MON nodes. We also have a single "backup"
          node which is responsible for "backups" pool, handled by CRUSH
          map rules.
        

        While creating backup everything looks good. Backup node is
          overwhelmed with load, but that's to be expected. Problem
          begins when we start deleting old backups.
        While old backup is deleted, utilization of main nodes OSDs
          skyrockets up to 100%. This leads to slow requests in main
          storage pools, which, given enough time, can lead to a process
          hang, or at least SCSI reset attempts, and in worst cases - VM
          hangs.
        

        I'm looking for a solution to avoid this issue.
        

        So far I understand that I don't know how CEPH snapshot
          mechanics work at all, because I can't figure why deleting a
          backup leads to requests not to backup OSDs, where backup data
          is really stored, but rather to main OSDs, where original
          objects reside. Is there any good doc on this?
        

        Googling shows that I'm not the first one to encounter this
          issue, but I cound't find any exact solution anywhere. Here's
          a short list of ideas:
         - use osd snap trim priority = 1. This is reported as not
          as helpfull, as this is already lower than client IO priority
          = 63;
         - use osd_snap_trim_sleep,
            but as far as I see it's broken in jewel, and will only be
            fixed in 10.2.8 - http://tracker.ceph.com/issues/19328;
         - disabling fast-diff and object map features seem to
          help, but I'm not sure what are the tradeoffs for this
          scenario.
        

    Sounds like you researched it well, but you missed the most
    important setting:

    
        osd_pg_max_concurrent_snap_trims=1

    
    (default is 2)

    
    Also somebody said this might help by doing directory splitting less
    often (but maybe more work at once):

    
        filestore_split_multiple = 8

    
    (default is 2)

    
        I'll appreciate any ideas on how to fix this.
      
      
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com