On 04/10/2013 09:16 PM, Stefan Priebe wrote:
Hello list, i'm using ceph 0.56.4 and i've to replace some drives. But while ceph is backfilling / recovering all VMs have high latencies and sometimes they're even offline. I just replace one drive at a time. I putted in the new drives and i'm reweighting them from 0.0 to 1.0 in 0.1 steps. I already lowered osd recovery max active = 2 and osd max backfills = 3, but when i put them back at 1.0 the vms are nearly all down. Right now some drives are SSDs so they're a lot faster than the HDDs i'm going to replace them too. Nothing in the logs but it is recovering at 3700MB/s that this is not possible on SATA HDDs is clear. Log example: 2013-04-10 20:55:33.711289 mon.0 [INF] pgmap v9293315: 8128 pgs: 233 active, 7876 active+clean, 19 active+recovery_wait; 557 GB data, 1168 GB used, 7003 GB / 8171 GB avail; 2108KB/s wr, 329op/s; 31/309692 degraded (0.010%); recovering 840 o/s, 3278MB/s
There is a issue about this in the tracker, I saw it this week but I'm not able to find it anymore.
I'm seeing this as well, when the cluster is recovering RBD images tend to get very sluggish.
Most of the time I'm blaiming the CPUs in the OSDs for it, but I've also seen it on faster systems.
Greets, Stefan _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-- Wido den Hollander 42on B.V. Phone: +31 (0)20 700 9902 Skype: contact42on _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com