Re: OSD removal rebalancing again

Christian Balzer <chibi@xxxxxxx> · Tue, 27 Jan 2015 10:52:40 +0900

On Tue, 27 Jan 2015 01:37:52 +0000 Quenten Grasso wrote:

> Hi Christian,
> 
> As you'll probably notice we have 11,22,33,44 marked as out as well. but
> here's our tree.
> 
> all of the OSD's in question had already been rebalanced/emptied from
> the hosts. osd.0 existed on pbnerbd01
> 
Ah, lemme re-phrase that then, I was assuming a simpler scenario. 

Same reasoning, by removing the ODS the weight (not reweight) of the host
changed (from 11 to 10) and that then triggered the re-balancing. 

Clear as mud? ^.^

Christian

> 
> # ceph osd tree
> # id    weight  type name       up/down reweight
> -1      54      root default
> -3      54              rack unknownrack
> -2      10                      host pbnerbd01
> 1       1                               osd.1   up      1
> 10      1                               osd.10  up      1
> 2       1                               osd.2   up      1
> 3       1                               osd.3   up      1
> 4       1                               osd.4   up      1
> 5       1                               osd.5   up      1
> 6       1                               osd.6   up      1
> 7       1                               osd.7   up      1
> 8       1                               osd.8   up      1
> 9       1                               osd.9   up      1
> -4      11                      host pbnerbd02
> 11      1                               osd.11  up      0
> 12      1                               osd.12  up      1
> 13      1                               osd.13  up      1
> 14      1                               osd.14  up      1
> 15      1                               osd.15  up      1
> 16      1                               osd.16  up      1
> 17      1                               osd.17  up      1
> 18      1                               osd.18  up      1
> 19      1                               osd.19  up      1
> 20      1                               osd.20  up      1
> 21      1                               osd.21  up      1
> -5      11                      host pbnerbd03
> 22      1                               osd.22  up      0
> 23      1                               osd.23  up      1
> 24      1                               osd.24  up      1
> 25      1                               osd.25  up      1
> 26      1                               osd.26  up      1
> 27      1                               osd.27  up      1
> 28      1                               osd.28  up      1
> 29      1                               osd.29  up      1
> 30      1                               osd.30  up      1
> 31      1                               osd.31  up      1
> 32      1                               osd.32  up      1
> -6      11                      host pbnerbd04
> 33      1                               osd.33  up      0
> 34      1                               osd.34  up      1
> 35      1                               osd.35  up      1
> 36      1                               osd.36  up      1
> 37      1                               osd.37  up      1
> 38      1                               osd.38  up      1
> 39      1                               osd.39  up      1
> 40      1                               osd.40  up      1
> 41      1                               osd.41  up      1
> 42      1                               osd.42  up      1
> 43      1                               osd.43  up      1
> -7      11                      host pbnerbd05
> 44      1                               osd.44  up      0
> 45      1                               osd.45  up      1
> 46      1                               osd.46  up      1
> 47      1                               osd.47  up      1
> 48      1                               osd.48  up      1
> 49      1                               osd.49  up      1
> 50      1                               osd.50  up      1
> 51      1                               osd.51  up      1
> 52      1                               osd.52  up      1
> 53      1                               osd.53  up      1
> 54      1                               osd.54  up      1
> 
> Regards,
> Quenten Grasso
> 
> -----Original Message-----
> From: Christian Balzer [mailto:chibi@xxxxxxx] 
> Sent: Tuesday, 27 January 2015 11:33 AM
> To: ceph-users@xxxxxxxxxxxxxx
> Cc: Quenten Grasso
> Subject: Re:  OSD removal rebalancing again
> 
> 
> Hello,
> 
> A "ceph -s" and "ceph osd tree" would have been nice, but my guess is
> that osd.0 was the only osd on that particular storage server?
> 
> In that case the removal of the bucket (host) by removing the last OSD
> in it also triggered a re-balancing. Not really/well documented AFAIK
> and annoying, but OTOH both expected (from a CRUSH perspective) and
> harmless.
> 
> Christian
> 
> On Tue, 27 Jan 2015 01:21:28 +0000 Quenten Grasso wrote:
> 
> > Hi All,
> > 
> > I just removed an OSD from our cluster following the steps on 
> > http://ceph.com/docs/master/rados/operations/add-or-rm-osds/
> > 
> > First I set the OSD as out,
> > 
> > ceph osd out osd.0
> > 
> > This emptied the OSD and eventually health of the cluster came back to 
> > normal/ok. and OSD was up and out. (took about 2-3 hours) (OSD.0 used 
> > space before setting as OUT was 900~ GB after rebalance took place OSD 
> > Usage was ~150MB)
> > 
> > Once this was all ok I then proceeded to STOP the OSD.
> > 
> > service ceph stop osd.0
> > 
> > checked cluster health and all looked ok, then I decided to remove the 
> > osd using the following commands.
> > 
> > ceph osd crush remove osd.0
> > ceph auth del osd.0
> > ceph osd rm 0
> > 
> > 
> > Now our cluster says
> > health HEALTH_WARN 414 pgs backfill; 12 pgs backfilling; 19 pgs 
> > recovering; 344 pgs recovery_wait; 789 pgs stuck unclean; recovery
> > 390967/10986568 objects degraded (3.559%)
> > 
> > before using the removal procedure everything was "ok" and the osd.0 
> > had been emptied and seemingly rebalanced.
> > 
> > Any ideas why its rebalancing again?
> > 
> > we're using Ubuntu 12.04 w/ Ceph 80.8 & Kernel 3.13.0-43-generic 
> > #72~precise1-Ubuntu SMP Tue Dec 9 12:14:18 UTC 2014 x86_64 x86_64 
> > x86_64 GNU/Linux
> > 
> > 
> > 
> > Regards,
> > Quenten Grasso
> 
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com