Massive slowrequests causes OSD daemon to eat whole RAM

pwoszuk <pwoszuk@xxxxxxxxxxxxx> · Wed, 5 Jul 2017 11:18:22 +0200



    Hello
    We have a cluster of 10 ceph servers.

    
    On that cluster there are EC pool with replicated SSD cache tier,
      used by OpenStack Cinder for volumes storage for production
      environment.

    
    From 2 days we observe messages like this in logs:
    2017-07-05 10:50:13.451987 osd.114 [WRN] slow request
        1165.927215 seconds old, received at 2017-07-05 10:30:47.104746:
        osd_op(osd.130.50779:43441 11.57a05c54
        rbd_data.5bc14d3135d111a.0000000000000084 [copy-get max 8388608]
        snapc 0=[]
ack+read+rwordered+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected
        e50881) currently waiting for rw locks

    
    in this example:

    
      OSD.114 is on HDD backend with EC pool in it
      OSD.130 is on SSD tier
    
    We've analyzed logs and found, that from the beginning RBD image
      listed above [rbd_data.5bc14d3135d111a] causes problem
      from very beginning. Virtual machine (OpenStack uses Ceph cluster
      as backend storage for Cinder) is DOWN/STOPPED. Our conclusion is
      that this means that problem lies on cluster, not client side.

    
    This unfortunately results in huge amount of blocked requests and
      RAM consumption. In result system restarts OSD daemon, and
      situation starts to repeat.
    
    We've tried to temporary down problematic OSD's, but problem
      propagate to different OSD pair.
    
    Using "ceph daemon osd.<ID> dump_ops_in_flight" on
      problematic OSDS causes OSD to hangand in few minutes down by
      cluster, with no response from command.

    
    SSD model used for SSD cache tier pool is: SAMSUNG MZ7KM240
    Could anyone tell what does those log messages means ? Anyone had
      such a problem and could help to diagnose/repair ?
    Thanks for any help
    -------------------------------------------------

      Pawel Woszuk

      PSNC, Poznan Supercomputing and Networking Center

      Poznań, Poland

      
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com