Massive slowrequests causes OSD daemon to eat whole RAM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello

We have a cluster of 10 ceph servers.

On that cluster there are EC pool with replicated SSD cache tier, used by OpenStack Cinder for volumes storage for production environment.

From 2 days we observe messages like this in logs:

2017-07-05 10:50:13.451987 osd.114 [WRN] slow request 1165.927215 seconds old, received at 2017-07-05 10:30:47.104746: osd_op(osd.130.50779:43441 11.57a05c54 rbd_data.5bc14d3135d111a.0000000000000084 [copy-get max 8388608] snapc 0=[] ack+read+rwordered+ignore_cache+ignore_overlay+map_snap_clone+known_if_redirected e50881) currently waiting for rw locks

in this example:

  • OSD.114 is on HDD backend with EC pool in it
  • OSD.130 is on SSD tier

We've analyzed logs and found, that from the beginning RBD image listed above [rbd_data.5bc14d3135d111a] causes problem from very beginning. Virtual machine (OpenStack uses Ceph cluster as backend storage for Cinder) is DOWN/STOPPED. Our conclusion is that this means that problem lies on cluster, not client side.

This unfortunately results in huge amount of blocked requests and RAM consumption. In result system restarts OSD daemon, and situation starts to repeat.

We've tried to temporary down problematic OSD's, but problem propagate to different OSD pair.

Using "ceph daemon osd.<ID> dump_ops_in_flight" on problematic OSDS causes OSD to hangand in few minutes down by cluster, with no response from command.

SSD model used for SSD cache tier pool is: SAMSUNG MZ7KM240

Could anyone tell what does those log messages means ? Anyone had such a problem and could help to diagnose/repair ?

Thanks for any help

-------------------------------------------------
Pawel Woszuk
PSNC, Poznan Supercomputing and Networking Center
Poznań, Poland

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux