Hi, Our setup is currently comprised of 5 OSD nodes with 12 OSD each, for a total of 60 OSDs. All of these are SSDs with 4 SSD journals on each. The ceph version is hammer v0.94.1 . There is a performance overhead because we're using SSDs (I've heard it gets better in infernalis, but we're not upgrading just yet) but we can reach numbers that I would consider "alright". Now, the issue is, when the cluster goes into recovery it's very fast at first, but then slows down to ridiculous levels as it moves forward. You can go from 7% to 2% to recover in ten minutes, but it may take 2 hours to recover the last 2%. While this happens, the attached openstack setup becomes incredibly slow, even though there is only a small fraction of objects still recovering (less than 1%). The settings that may affect recovery speed are very low, as they are by default, yet they still affect client io speed way more than it should. Why would ceph recovery become so slow as it progress and affect client io even though it's recovering at a snail's pace? And by a snail's pace, I mean a few kb/second on 10gbps uplinks. -- ====================== Jean-Philippe Méthot Administrateur système / System administrator GloboTech Communications Phone: 1-514-907-0050 Toll Free: 1-(888)-GTCOMM1 Fax: 1-(514)-907-0750 jpmethot@xxxxxxxxxx http://www.gtcomm.net _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com