Re: ceph recovering results in offline VMs

Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> · Wed, 10 Apr 2013 21:51:23 +0200

Am 10.04.2013 um 21:36 schrieb Wido den Hollander <wido@xxxxxxxx>:

> On 04/10/2013 09:16 PM, Stefan Priebe wrote:
>> Hello list,
>> 
>> i'm using ceph 0.56.4 and i've to replace some drives. But while ceph is
>> backfilling / recovering all VMs have high latencies and sometimes
>> they're even offline. I just replace one drive at a time.
>> 
>> I putted in the new drives and i'm reweighting them from 0.0 to 1.0 in
>> 0.1 steps.
>> 
>> I already lowered osd recovery max active = 2 and osd max backfills = 3,
>> but when i put them back at 1.0 the vms are nearly all down.
>> 
>> Right now some drives are SSDs so they're a lot faster than the HDDs i'm
>> going to replace them too.
>> 
>> Nothing in the logs but it is recovering at 3700MB/s that this is not
>> possible on SATA HDDs is clear.
>> 
>> Log example:
>> 2013-04-10 20:55:33.711289 mon.0 [INF] pgmap v9293315: 8128 pgs: 233
>> active, 7876 active+clean, 19 active+recovery_wait; 557 GB data, 1168 GB
>> used, 7003 GB / 8171 GB avail; 2108KB/s wr, 329op/s; 31/309692 degraded
>> (0.010%);  recovering 840 o/s, 3278MB/s
> 
> There is a issue about this in the tracker, I saw it this week but I'm not able to find it anymore.

3737?

> I'm seeing this as well, when the cluster is recovering RBD images tend to get very sluggish.
> 
> Most of the time I'm blaiming the CPUs in the OSDs for it, but I've also seen it on faster systems.

I've 3,6Ghz xeons with just 4 osds per host.

Stefan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com