Re: OSD removal rebalancing again

Quenten Grasso <qgrasso@xxxxxxxxxx> · Tue, 27 Jan 2015 02:02:13 +0000

Hi Christian,

Ahh yes, The overall host weight changed when removing the OSD as all OSD's make up the host weight in turn removal of the OSD then decreased the host weight which then triggered the rebalancing.

I guess it would have made more sense if setting the osd as "out" caused the same affect earlier instead of after removing the already emptied disk. *frustrating*

So would it be possible/recommended to "statically" set the host weight as 11 in this case and once removal from crush happens it shouldn't cause a rebalance because its already been rebalanced anyway?

Regards,
Quenten Grasso

-----Original Message-----
From: Christian Balzer [mailto:chibi@xxxxxxx] 
Sent: Tuesday, 27 January 2015 11:53 AM
To: ceph-users@xxxxxxxxxxxxxx
Cc: Quenten Grasso
Subject: Re:  OSD removal rebalancing again

On Tue, 27 Jan 2015 01:37:52 +0000 Quenten Grasso wrote:

> Hi Christian,
> 
> As you'll probably notice we have 11,22,33,44 marked as out as well. 
> but here's our tree.
> 
> all of the OSD's in question had already been rebalanced/emptied from 
> the hosts. osd.0 existed on pbnerbd01
> 
Ah, lemme re-phrase that then, I was assuming a simpler scenario. 

Same reasoning, by removing the ODS the weight (not reweight) of the host changed (from 11 to 10) and that then triggered the re-balancing. 

Clear as mud? ^.^

Christian

> 
> # ceph osd tree
> # id    weight  type name       up/down reweight
> -1      54      root default
> -3      54              rack unknownrack
> -2      10                      host pbnerbd01
> 1       1                               osd.1   up      1
> 10      1                               osd.10  up      1
> 2       1                               osd.2   up      1
> 3       1                               osd.3   up      1
> 4       1                               osd.4   up      1
> 5       1                               osd.5   up      1
> 6       1                               osd.6   up      1
> 7       1                               osd.7   up      1
> 8       1                               osd.8   up      1
> 9       1                               osd.9   up      1
> -4      11                      host pbnerbd02
> 11      1                               osd.11  up      0
> 12      1                               osd.12  up      1
> 13      1                               osd.13  up      1
> 14      1                               osd.14  up      1
> 15      1                               osd.15  up      1
> 16      1                               osd.16  up      1
> 17      1                               osd.17  up      1
> 18      1                               osd.18  up      1
> 19      1                               osd.19  up      1
> 20      1                               osd.20  up      1
> 21      1                               osd.21  up      1
> -5      11                      host pbnerbd03
> 22      1                               osd.22  up      0
> 23      1                               osd.23  up      1
> 24      1                               osd.24  up      1
> 25      1                               osd.25  up      1
> 26      1                               osd.26  up      1
> 27      1                               osd.27  up      1
> 28      1                               osd.28  up      1
> 29      1                               osd.29  up      1
> 30      1                               osd.30  up      1
> 31      1                               osd.31  up      1
> 32      1                               osd.32  up      1
> -6      11                      host pbnerbd04
> 33      1                               osd.33  up      0
> 34      1                               osd.34  up      1
> 35      1                               osd.35  up      1
> 36      1                               osd.36  up      1
> 37      1                               osd.37  up      1
> 38      1                               osd.38  up      1
> 39      1                               osd.39  up      1
> 40      1                               osd.40  up      1
> 41      1                               osd.41  up      1
> 42      1                               osd.42  up      1
> 43      1                               osd.43  up      1
> -7      11                      host pbnerbd05
> 44      1                               osd.44  up      0
> 45      1                               osd.45  up      1
> 46      1                               osd.46  up      1
> 47      1                               osd.47  up      1
> 48      1                               osd.48  up      1
> 49      1                               osd.49  up      1
> 50      1                               osd.50  up      1
> 51      1                               osd.51  up      1
> 52      1                               osd.52  up      1
> 53      1                               osd.53  up      1
> 54      1                               osd.54  up      1
> 
> Regards,
> Quenten Grasso
> 
> -----Original Message-----
> From: Christian Balzer [mailto:chibi@xxxxxxx]
> Sent: Tuesday, 27 January 2015 11:33 AM
> To: ceph-users@xxxxxxxxxxxxxx
> Cc: Quenten Grasso
> Subject: Re:  OSD removal rebalancing again
> 
> 
> Hello,
> 
> A "ceph -s" and "ceph osd tree" would have been nice, but my guess is 
> that osd.0 was the only osd on that particular storage server?
> 
> In that case the removal of the bucket (host) by removing the last OSD 
> in it also triggered a re-balancing. Not really/well documented AFAIK 
> and annoying, but OTOH both expected (from a CRUSH perspective) and 
> harmless.
> 
> Christian
> 
> On Tue, 27 Jan 2015 01:21:28 +0000 Quenten Grasso wrote:
> 
> > Hi All,
> > 
> > I just removed an OSD from our cluster following the steps on 
> > http://ceph.com/docs/master/rados/operations/add-or-rm-osds/
> > 
> > First I set the OSD as out,
> > 
> > ceph osd out osd.0
> > 
> > This emptied the OSD and eventually health of the cluster came back 
> > to normal/ok. and OSD was up and out. (took about 2-3 hours) (OSD.0 
> > used space before setting as OUT was 900~ GB after rebalance took 
> > place OSD Usage was ~150MB)
> > 
> > Once this was all ok I then proceeded to STOP the OSD.
> > 
> > service ceph stop osd.0
> > 
> > checked cluster health and all looked ok, then I decided to remove 
> > the osd using the following commands.
> > 
> > ceph osd crush remove osd.0
> > ceph auth del osd.0
> > ceph osd rm 0
> > 
> > 
> > Now our cluster says
> > health HEALTH_WARN 414 pgs backfill; 12 pgs backfilling; 19 pgs 
> > recovering; 344 pgs recovery_wait; 789 pgs stuck unclean; recovery
> > 390967/10986568 objects degraded (3.559%)
> > 
> > before using the removal procedure everything was "ok" and the osd.0 
> > had been emptied and seemingly rebalanced.
> > 
> > Any ideas why its rebalancing again?
> > 
> > we're using Ubuntu 12.04 w/ Ceph 80.8 & Kernel 3.13.0-43-generic 
> > #72~precise1-Ubuntu SMP Tue Dec 9 12:14:18 UTC 2014 x86_64 x86_64
> > x86_64 GNU/Linux
> > 
> > 
> > 
> > Regards,
> > Quenten Grasso
> 
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com