We have 12 osds per host, so we've gone conservative and set recovery max active to 1 and max backfills to 4. We also set nodown prior to adding a new osd since we saw flapping can be even more problematic in recovery.
On Apr 12, 2013 8:04 PM, "Dave Spano" <dspano@xxxxxxxxxxxxxx> wrote:
What are you settings for recovery max active and backfill? Just curious.Dave Spano
From: "Erdem Agaoglu" <erdem.agaoglu@xxxxxxxxx>
To: "Dave Spano" <dspano@xxxxxxxxxxxxxx>
Cc: "Stefan Priebe - Profihost AG" <s.priebe@xxxxxxxxxxxx>, "ceph-users" <ceph-users@xxxxxxxxxxxxxx>
Sent: Friday, April 12, 2013 12:48:05 PM
Subject: Re: [ceph-users] ceph recovering results in offline VMsWe are also seeing a similar problem which we believe it's #3737. Our VMs (running mongodbs) were being completely frozen for 2-3 minutes (sometimes longer) while adding a new OSD. We have reduced recovery max active and backfill settings and ensured that we have RBD caching and now it seems things are better. We still see some increase in iowaits but VM's continue to function.
But that i guess depends on what VM actually does at that moment. We did some fio tests before running actual services and what we saw was that while individual read or write tests were able to survive OSD addition with some degraded performance, concurrent read-write tests (rw and randrw in fio talk) were completely stalled. I mean the VM was able to function in individual read or write tests even if performance sometimes drops to 0 iops, but it was frozen in rw/randrw test in addition to dropping to 0 iops.BTW Stefan, i'm in no way experienced with ceph and i don't know about your OSD's but 8128 pgs for a 8TB cluster seems too much. Or is it OK when disks are SSDs?On Fri, Apr 12, 2013 at 5:23 PM, Dave Spano <dspano@xxxxxxxxxxxxxx> wrote:Very interesting. I ran into the same thing yesterday when I added SATA disks to the cluster. I was about to return them for SAS drives instead because of how long it took, and how slow some of my RBDs got.
Are most people using SATA 7200 RPM drives? My concern was with Oracle DBs. Postgres doesn't seem to have as much of a problem running on an RBD, but I noticed a marked difference with Oracle.Dave Spano
From: "Stefan Priebe - Profihost AG" <s.priebe@xxxxxxxxxxxx>
To: "Wido den Hollander" <wido@xxxxxxxx>
Cc: ceph-users@xxxxxxxxxxxxxx
Sent: Wednesday, April 10, 2013 3:51:23 PM
Subject: Re: [ceph-users] ceph recovering results in offline VMs
Am 10.04.2013 um 21:36 schrieb Wido den Hollander <wido@xxxxxxxx>:
> On 04/10/2013 09:16 PM, Stefan Priebe wrote:
>> Hello list,
>>
>> i'm using ceph 0.56.4 and i've to replace some drives. But while ceph is
>> backfilling / recovering all VMs have high latencies and sometimes
>> they're even offline. I just replace one drive at a time.
>>
>> I putted in the new drives and i'm reweighting them from 0.0 to 1.0 in
>> 0.1 steps.
>>
>> I already lowered osd recovery max active = 2 and osd max backfills = 3,
>> but when i put them back at 1.0 the vms are nearly all down.
>>
>> Right now some drives are SSDs so they're a lot faster than the HDDs i'm
>> going to replace them too.
>>
>> Nothing in the logs but it is recovering at 3700MB/s that this is not
>> possible on SATA HDDs is clear.
>>
>> Log example:
>> 2013-04-10 20:55:33.711289 mon.0 [INF] pgmap v9293315: 8128 pgs: 233
>> active, 7876 active+clean, 19 active+recovery_wait; 557 GB data, 1168 GB
>> used, 7003 GB / 8171 GB avail; 2108KB/s wr, 329op/s; 31/309692 degraded
>> (0.010%); recovering 840 o/s, 3278MB/s
>
> There is a issue about this in the tracker, I saw it this week but I'm not able to find it anymore.
3737?
> I'm seeing this as well, when the cluster is recovering RBD images tend to get very sluggish.
>
> Most of the time I'm blaiming the CPUs in the OSDs for it, but I've also seen it on faster systems.
I've 3,6Ghz xeons with just 4 osds per host.
Stefan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
erdem agaoglu
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com