Backfill/recivory very slow

Sylvain Munaut <s.munaut@xxxxxxxxxxxxxxxxxxxx> · Thu, 4 Jul 2013 16:23:46 +0200

Hi,

I'm doing some test on a cluster, or at least part of it. I have
several crush rule reparting data over different types of OSD. The
only part of interest today is composed of 768 PGs distributed over 4
servers and 8 OSDs (2 OSDs per server). This pool is almost empty
there is like 5-10 Go of data in it or so.

As a test, I shutdown one of the server and let it redistribute data.

And it's taking forever ... 1 hour after the start, it's still not
done, and status is something like :

2013-07-04 13:53:42.393478 mon.0 [INF] pgmap v16947951: 12808 pgs:
12603 active+clean, 6 active+degraded+wait_backfill, 171
active+recovery_wait, 1 active+degraded+backfilling, 1
active+degraded+remapped+wait_backfill, 26 active+recovering; 796 GB
data, 1877 GB used, 12419 GB / 14296 GB avail; 764KB/s rd, 122KB/s wr,
68op/s; 158783/2934125 degraded (5.412%);  recovering 93 o/s, 1790KB/s

The network usage, disk usage and cpu usage on the OSDs is very low so
I'm not sure why it's so slow. I mean some throttling to serve real
query in priority is good but here it's just a bit too much.
I mean it's doing less than 1Mo/s network transfer or disk writes ...

Anyone has an explanation and/or better a solution ?

Cheers,

    Sylvain
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com