Re: [solved] Changing CRUSH rule on a running cluster

Olivier Bonvalet <ceph.list@xxxxxxxxx> · Fri, 08 Mar 2013 08:18:06 +0100

Le mercredi 06 mars 2013 à 09:39 +0100, Olivier Bonvalet a écrit :
> 
> Le lundi 04 mars 2013 à 09:02 -0800, Gregory Farnum a écrit :
> > On Mon, Mar 4, 2013 at 12:19 AM, Olivier Bonvalet <ceph.list@xxxxxxxxx> wrote:
> > > Hello,
> > >
> > > I have a running cluster, which use the (previous) default CRUSH rule,
> > > with :
> > >         step take default
> > >         step choose firstn 0 type osd
> > >         step emit
> > >
> > > Since I have multiple OSD in the same host, I need to change that to
> > > have better redondancy.
> > > My crush map use types datacenter → room → network → rack → host → osd.
> > >
> > > For now I would like to replicate per "network", I suppose I can simply
> > > replace "step choose firstn 0 type osd" per "step choose firstn 0 type
> > > network" ; but is it safe to do that on a running cluster ?
> > 
> > Not quite — you'll want to do "step chooseleaf firstn 0 type network".
> > See http://ceph.com/docs/master/rados/operations/crush-map for details
> > on this stuff.
> > Yes, it is safe to do on a running cluster.
> > 
> > 
> > > And I suppose it will throw a lot of data move, is it possible to do
> > > that «slowly», to avoid production slowdown ?
> > 
> > Yeah, it's likely to move the entire contents. If you're on Argonaut
> > you're kind of stuck; if Bobtail then you have some control through
> > the "osd_max_backfills" option (defaults to 10) that specifies how
> > many moves each OSD will let itself be involved in at once. You should
> > practice on a non-production cluster first to make sure you don't run
> > into issues with pgtemp or OSDMap generation, though.
> > -Greg
> > 
> 
> Thanks for your answer. So I made some tests on a dedicated spool, I was
> able to move data from «platter» to «SSD» very well, it's great.
> 
> But I can't obtain that per "network" neither per "host" setup :
> with 2 hosts, each one with 2 OSD, and with a pool with use only 1
> replica (so, 2 copies), I tried this rule :
> 
>         rule rbdperhost {
>         	ruleset 5
>         	type replicated
>         	min_size 1
>         	max_size 10
>         	step take default
>         	step chooseleaf firstn 0 type host
>         	step emit
>         }
> 
> 
> As a result I obtain some PG which stuck in «active+remapped» state.
> When querying one of this PG, I see that CRUSH find only one OSD up for
> this one and can't find an other OSD to set replica.
> 
> If I well understand, in this case the "chooseleaf firstn 0 type host"
> say to Ceph to choose 2 differents hosts, then in each of them choose
> one OSD. So with 2 hosts, it should works, no ?
> 
> Thanks,
> Olivier B.
> 
> 

So, as said on IRC, it's solved. My rules were not working, and after
use of «tunables» it's ok.

I love that feature of changing data spreading on live !

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com