Hello, We are running 0.80.5 on our production cluster and we are seeing slow requests when deleting rbd snapshots. We have now reduced snapshot counts to 4 weeklies but it seems that the snapshot count is not a factor of this problem. The cluster is practically unresponsive so long that clients timeout. Here are top ten slowest requests per osd from last night (times in seconds): 1 /var/log/ceph/ceph-osd.46.log 1920 2 /var/log/ceph/ceph-osd.42.log 1455 3 /var/log/ceph/ceph-osd.74.log 1292 4 /var/log/ceph/ceph-osd.77.log 1170 5 /var/log/ceph/ceph-osd.48.log 1083 6 /var/log/ceph/ceph-osd.0.log 960 7 /var/log/ceph/ceph-osd.40.log 960 8 /var/log/ceph/ceph-osd.57.log 960 9 /var/log/ceph/ceph-osd.61.log 960 10 /var/log/ceph/ceph-osd.76.log 960 Some OSDs don't report slow requests at all, they are not evenly distributed. Currently we run journals on the osd sata drives, but are considering upgrading to SSD journals. However, we do not have any performance problems other than when deleting snapshots. Is there any way to mitigate the problem other than investing on SSD journals? -- Eino Tuominen _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com