Hi, I'm doing some test on a cluster, or at least part of it. I have several crush rule reparting data over different types of OSD. The only part of interest today is composed of 768 PGs distributed over 4 servers and 8 OSDs (2 OSDs per server). This pool is almost empty there is like 5-10 Go of data in it or so. As a test, I shutdown one of the server and let it redistribute data. And it's taking forever ... 1 hour after the start, it's still not done, and status is something like : 2013-07-04 13:53:42.393478 mon.0 [INF] pgmap v16947951: 12808 pgs: 12603 active+clean, 6 active+degraded+wait_backfill, 171 active+recovery_wait, 1 active+degraded+backfilling, 1 active+degraded+remapped+wait_backfill, 26 active+recovering; 796 GB data, 1877 GB used, 12419 GB / 14296 GB avail; 764KB/s rd, 122KB/s wr, 68op/s; 158783/2934125 degraded (5.412%); recovering 93 o/s, 1790KB/s The network usage, disk usage and cpu usage on the OSDs is very low so I'm not sure why it's so slow. I mean some throttling to serve real query in priority is good but here it's just a bit too much. I mean it's doing less than 1Mo/s network transfer or disk writes ... Anyone has an explanation and/or better a solution ? Cheers, Sylvain _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com