Hello Quenten, On Tue, 27 Jan 2015 02:02:13 +0000 Quenten Grasso wrote: > Hi Christian, > > Ahh yes, The overall host weight changed when removing the OSD as all > OSD's make up the host weight in turn removal of the OSD then decreased > the host weight which then triggered the rebalancing. > Indeed. > I guess it would have made more sense if setting the osd as "out" caused > the same affect earlier instead of after removing the already emptied > disk. *frustrating* > Yes and no, that behavior can be used to your advantage when you're replacing disks. Re-adding that OSD (same ID) should shorten the rebalancing significantly. > So would it be possible/recommended to "statically" set the host weight > as 11 in this case and once removal from crush happens it shouldn't > cause a rebalance because its already been rebalanced anyway? > I don't think so, but somebody else correct me if I'm wrong. If you're actually _replacing_ those OSDs and not permanently removing them, search the ML archives for some tricks (by Craig Lewis IIRC) to minimize the balancing song and dance. Christian > Regards, > Quenten Grasso > > > -----Original Message----- > From: Christian Balzer [mailto:chibi@xxxxxxx] > Sent: Tuesday, 27 January 2015 11:53 AM > To: ceph-users@xxxxxxxxxxxxxx > Cc: Quenten Grasso > Subject: Re: OSD removal rebalancing again > > On Tue, 27 Jan 2015 01:37:52 +0000 Quenten Grasso wrote: > > > Hi Christian, > > > > As you'll probably notice we have 11,22,33,44 marked as out as well. > > but here's our tree. > > > > all of the OSD's in question had already been rebalanced/emptied from > > the hosts. osd.0 existed on pbnerbd01 > > > Ah, lemme re-phrase that then, I was assuming a simpler scenario. > > Same reasoning, by removing the ODS the weight (not reweight) of the > host changed (from 11 to 10) and that then triggered the re-balancing. > > Clear as mud? ^.^ > > Christian > > > > > # ceph osd tree > > # id weight type name up/down reweight > > -1 54 root default > > -3 54 rack unknownrack > > -2 10 host pbnerbd01 > > 1 1 osd.1 up 1 > > 10 1 osd.10 up 1 > > 2 1 osd.2 up 1 > > 3 1 osd.3 up 1 > > 4 1 osd.4 up 1 > > 5 1 osd.5 up 1 > > 6 1 osd.6 up 1 > > 7 1 osd.7 up 1 > > 8 1 osd.8 up 1 > > 9 1 osd.9 up 1 > > -4 11 host pbnerbd02 > > 11 1 osd.11 up 0 > > 12 1 osd.12 up 1 > > 13 1 osd.13 up 1 > > 14 1 osd.14 up 1 > > 15 1 osd.15 up 1 > > 16 1 osd.16 up 1 > > 17 1 osd.17 up 1 > > 18 1 osd.18 up 1 > > 19 1 osd.19 up 1 > > 20 1 osd.20 up 1 > > 21 1 osd.21 up 1 > > -5 11 host pbnerbd03 > > 22 1 osd.22 up 0 > > 23 1 osd.23 up 1 > > 24 1 osd.24 up 1 > > 25 1 osd.25 up 1 > > 26 1 osd.26 up 1 > > 27 1 osd.27 up 1 > > 28 1 osd.28 up 1 > > 29 1 osd.29 up 1 > > 30 1 osd.30 up 1 > > 31 1 osd.31 up 1 > > 32 1 osd.32 up 1 > > -6 11 host pbnerbd04 > > 33 1 osd.33 up 0 > > 34 1 osd.34 up 1 > > 35 1 osd.35 up 1 > > 36 1 osd.36 up 1 > > 37 1 osd.37 up 1 > > 38 1 osd.38 up 1 > > 39 1 osd.39 up 1 > > 40 1 osd.40 up 1 > > 41 1 osd.41 up 1 > > 42 1 osd.42 up 1 > > 43 1 osd.43 up 1 > > -7 11 host pbnerbd05 > > 44 1 osd.44 up 0 > > 45 1 osd.45 up 1 > > 46 1 osd.46 up 1 > > 47 1 osd.47 up 1 > > 48 1 osd.48 up 1 > > 49 1 osd.49 up 1 > > 50 1 osd.50 up 1 > > 51 1 osd.51 up 1 > > 52 1 osd.52 up 1 > > 53 1 osd.53 up 1 > > 54 1 osd.54 up 1 > > > > Regards, > > Quenten Grasso > > > > -----Original Message----- > > From: Christian Balzer [mailto:chibi@xxxxxxx] > > Sent: Tuesday, 27 January 2015 11:33 AM > > To: ceph-users@xxxxxxxxxxxxxx > > Cc: Quenten Grasso > > Subject: Re: OSD removal rebalancing again > > > > > > Hello, > > > > A "ceph -s" and "ceph osd tree" would have been nice, but my guess is > > that osd.0 was the only osd on that particular storage server? > > > > In that case the removal of the bucket (host) by removing the last OSD > > in it also triggered a re-balancing. Not really/well documented AFAIK > > and annoying, but OTOH both expected (from a CRUSH perspective) and > > harmless. > > > > Christian > > > > On Tue, 27 Jan 2015 01:21:28 +0000 Quenten Grasso wrote: > > > > > Hi All, > > > > > > I just removed an OSD from our cluster following the steps on > > > http://ceph.com/docs/master/rados/operations/add-or-rm-osds/ > > > > > > First I set the OSD as out, > > > > > > ceph osd out osd.0 > > > > > > This emptied the OSD and eventually health of the cluster came back > > > to normal/ok. and OSD was up and out. (took about 2-3 hours) (OSD.0 > > > used space before setting as OUT was 900~ GB after rebalance took > > > place OSD Usage was ~150MB) > > > > > > Once this was all ok I then proceeded to STOP the OSD. > > > > > > service ceph stop osd.0 > > > > > > checked cluster health and all looked ok, then I decided to remove > > > the osd using the following commands. > > > > > > ceph osd crush remove osd.0 > > > ceph auth del osd.0 > > > ceph osd rm 0 > > > > > > > > > Now our cluster says > > > health HEALTH_WARN 414 pgs backfill; 12 pgs backfilling; 19 pgs > > > recovering; 344 pgs recovery_wait; 789 pgs stuck unclean; recovery > > > 390967/10986568 objects degraded (3.559%) > > > > > > before using the removal procedure everything was "ok" and the osd.0 > > > had been emptied and seemingly rebalanced. > > > > > > Any ideas why its rebalancing again? > > > > > > we're using Ubuntu 12.04 w/ Ceph 80.8 & Kernel 3.13.0-43-generic > > > #72~precise1-Ubuntu SMP Tue Dec 9 12:14:18 UTC 2014 x86_64 x86_64 > > > x86_64 GNU/Linux > > > > > > > > > > > > Regards, > > > Quenten Grasso > > > > > > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com