Re: ceph recovering results in offline VMs

Wido den Hollander <wido@xxxxxxxx> · Wed, 10 Apr 2013 21:36:44 +0200

On 04/10/2013 09:16 PM, Stefan Priebe wrote:
Hello list,

i'm using ceph 0.56.4 and i've to replace some drives. But while ceph is
backfilling / recovering all VMs have high latencies and sometimes
they're even offline. I just replace one drive at a time.

I putted in the new drives and i'm reweighting them from 0.0 to 1.0 in
0.1 steps.

I already lowered osd recovery max active = 2 and osd max backfills = 3,
but when i put them back at 1.0 the vms are nearly all down.

Right now some drives are SSDs so they're a lot faster than the HDDs i'm
going to replace them too.

Nothing in the logs but it is recovering at 3700MB/s that this is not
possible on SATA HDDs is clear.

Log example:
2013-04-10 20:55:33.711289 mon.0 [INF] pgmap v9293315: 8128 pgs: 233
active, 7876 active+clean, 19 active+recovery_wait; 557 GB data, 1168 GB
used, 7003 GB / 8171 GB avail; 2108KB/s wr, 329op/s; 31/309692 degraded
(0.010%);  recovering 840 o/s, 3278MB/s

There is a issue about this in the tracker, I saw it this week but I'm 
not able to find it anymore.

I'm seeing this as well, when the cluster is recovering RBD images tend 
to get very sluggish.

Most of the time I'm blaiming the CPUs in the OSDs for it, but I've also 
seen it on faster systems.

Greets,
Stefan

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Wido den Hollander
42on B.V.

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com