Re: Changing CRUSH rule on a running cluster

Olivier Bonvalet <ceph.list@xxxxxxxxx> · Wed, 06 Mar 2013 09:39:36 +0100

Le lundi 04 mars 2013 à 09:02 -0800, Gregory Farnum a écrit :
> On Mon, Mar 4, 2013 at 12:19 AM, Olivier Bonvalet <ceph.list@xxxxxxxxx> wrote:
> > Hello,
> >
> > I have a running cluster, which use the (previous) default CRUSH rule,
> > with :
> >         step take default
> >         step choose firstn 0 type osd
> >         step emit
> >
> > Since I have multiple OSD in the same host, I need to change that to
> > have better redondancy.
> > My crush map use types datacenter → room → network → rack → host → osd.
> >
> > For now I would like to replicate per "network", I suppose I can simply
> > replace "step choose firstn 0 type osd" per "step choose firstn 0 type
> > network" ; but is it safe to do that on a running cluster ?
> 
> Not quite — you'll want to do "step chooseleaf firstn 0 type network".
> See http://ceph.com/docs/master/rados/operations/crush-map for details
> on this stuff.
> Yes, it is safe to do on a running cluster.
> 
> 
> > And I suppose it will throw a lot of data move, is it possible to do
> > that «slowly», to avoid production slowdown ?
> 
> Yeah, it's likely to move the entire contents. If you're on Argonaut
> you're kind of stuck; if Bobtail then you have some control through
> the "osd_max_backfills" option (defaults to 10) that specifies how
> many moves each OSD will let itself be involved in at once. You should
> practice on a non-production cluster first to make sure you don't run
> into issues with pgtemp or OSDMap generation, though.
> -Greg
> 

Thanks for your answer. So I made some tests on a dedicated spool, I was
able to move data from «platter» to «SSD» very well, it's great.

But I can't obtain that per "network" neither per "host" setup :
with 2 hosts, each one with 2 OSD, and with a pool with use only 1
replica (so, 2 copies), I tried this rule :

        rule rbdperhost {
        	ruleset 5
        	type replicated
        	min_size 1
        	max_size 10
        	step take default
        	step chooseleaf firstn 0 type host
        	step emit
        }

As a result I obtain some PG which stuck in «active+remapped» state.
When querying one of this PG, I see that CRUSH find only one OSD up for
this one and can't find an other OSD to set replica.

If I well understand, in this case the "chooseleaf firstn 0 type host"
say to Ceph to choose 2 differents hosts, then in each of them choose
one OSD. So with 2 hosts, it should works, no ?

Thanks,
Olivier B.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com