ceph osd crush tunables optimal AND add new OSD at the same time

andrei@xxxxxxxxxx (Andrei Mikhailovsky) · Thu, 17 Jul 2014 13:24:44 +0100 (BST)

Sage, 

would it help if you add a cache pool to your cluster? Let's say if you add a few TBs of ssds acting as a cache pool to your cluster, would this help with retaining IO to the guest vms during data recovery or reshuffling? 

Over the past year and a half that we've been using ceph we had a positive experience for the majority of time. The only downtime we had for our vms was when ceph is doing recovery. It seems that regardless of the tuning options we've used, our vms are still unable to get IO, they get to 98-99% iowait and freeze. This has happened on dumpling, emperor and now firefly releases. Because of this I've set noout flag on my cluster and have to keep an eye on the osds for manual intervention, which is far from ideal case (((. 

Andrei 

-- 
Andrei Mikhailovsky 
Director 
Arhont Information Security 

Web: http://www.arhont.com 
http://www.wi-foo.com 
Tel: +44 (0)870 4431337 
Fax: +44 (0)208 429 3111 
PGP: Key ID - 0x2B3438DE 
PGP: Server - keyserver.pgp.com 

DISCLAIMER 

The information contained in this email is intended only for the use of the person(s) to whom it is addressed and may be confidential or contain legally privileged information. If you are not the intended recipient you are hereby notified that any perusal, use, distribution, copying or disclosure is strictly prohibited. If you have received this email in error please immediately advise us by return email at andrei at arhont.com and delete and purge the email and any attachments without making a copy. 

----- Original Message -----

From: "Sage Weil" <sweil@xxxxxxxxxx> 
To: "Gregory Farnum" <greg at inktank.com> 
Cc: ceph-users at lists.ceph.com 
Sent: Thursday, 17 July, 2014 1:06:52 AM 
Subject: Re: ceph osd crush tunables optimal AND add new OSD at the same time 

On Wed, 16 Jul 2014, Gregory Farnum wrote: 
> On Wed, Jul 16, 2014 at 4:45 PM, Craig Lewis <clewis at centraldesktop.com> wrote: 
> > One of the things I've learned is that many small changes to the cluster are 
> > better than one large change. Adding 20% more OSDs? Don't add them all at 
> > once, trickle them in over time. Increasing pg_num & pgp_num from 128 to 
> > 1024? Go in steps, not one leap. 
> > 
> > I try to avoid operations that will touch more than 20% of the disks 
> > simultaneously. When I had journals on HDD, I tried to avoid going over 10% 
> > of the disks. 
> > 
> > 
> > Is there a way to execute `ceph osd crush tunables optimal` in a way that 
> > takes smaller steps? 
> 
> Unfortunately not; the crush tunables are changes to the core 
> placement algorithms at work. 

Well, there is one way, but it is only somewhat effective. If you 
decompile the crush maps for bobtail vs firefly the actual difference is 

tunable chooseleaf_vary_r 1 

and this is written such that a value of 1 is the optimal 'new' way, 0 is 
the legacy old way, but values > 1 are less-painful steps between the two 
(though mostly closer to the firefly value of 1). So, you could set 

tunable chooseleaf_vary_r 4 

wait for it to settle, and then do 

tunable chooseleaf_vary_r 3 

...and so forth down to 1. I did some limited testing of the data 
movement involved and noted it here: 

https://github.com/ceph/ceph/commit/37f840b499da1d39f74bfb057cf2b92ef4e84dc6 

In my test case, going from 0 to 4 was about 1/10th as bad as going 
straight from 0 to 1, but the final step from 2 to 1 is still about 1/2 as 
bad. I'm not sure if that means it's not worth the trouble of not just 
jumping straight to the firefly tunables, or whether it means legacy users 
should just set (and leave) this at 2 or 3 or 4 and get almost all the 
benefit without the rebalance pain. 

sage 
_______________________________________________ 
ceph-users mailing list 
ceph-users at lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140717/896c2b21/attachment.htm>