Hi Christian, As you'll probably notice we have 11,22,33,44 marked as out as well. but here's our tree. all of the OSD's in question had already been rebalanced/emptied from the hosts. osd.0 existed on pbnerbd01 # ceph osd tree # id weight type name up/down reweight -1 54 root default -3 54 rack unknownrack -2 10 host pbnerbd01 1 1 osd.1 up 1 10 1 osd.10 up 1 2 1 osd.2 up 1 3 1 osd.3 up 1 4 1 osd.4 up 1 5 1 osd.5 up 1 6 1 osd.6 up 1 7 1 osd.7 up 1 8 1 osd.8 up 1 9 1 osd.9 up 1 -4 11 host pbnerbd02 11 1 osd.11 up 0 12 1 osd.12 up 1 13 1 osd.13 up 1 14 1 osd.14 up 1 15 1 osd.15 up 1 16 1 osd.16 up 1 17 1 osd.17 up 1 18 1 osd.18 up 1 19 1 osd.19 up 1 20 1 osd.20 up 1 21 1 osd.21 up 1 -5 11 host pbnerbd03 22 1 osd.22 up 0 23 1 osd.23 up 1 24 1 osd.24 up 1 25 1 osd.25 up 1 26 1 osd.26 up 1 27 1 osd.27 up 1 28 1 osd.28 up 1 29 1 osd.29 up 1 30 1 osd.30 up 1 31 1 osd.31 up 1 32 1 osd.32 up 1 -6 11 host pbnerbd04 33 1 osd.33 up 0 34 1 osd.34 up 1 35 1 osd.35 up 1 36 1 osd.36 up 1 37 1 osd.37 up 1 38 1 osd.38 up 1 39 1 osd.39 up 1 40 1 osd.40 up 1 41 1 osd.41 up 1 42 1 osd.42 up 1 43 1 osd.43 up 1 -7 11 host pbnerbd05 44 1 osd.44 up 0 45 1 osd.45 up 1 46 1 osd.46 up 1 47 1 osd.47 up 1 48 1 osd.48 up 1 49 1 osd.49 up 1 50 1 osd.50 up 1 51 1 osd.51 up 1 52 1 osd.52 up 1 53 1 osd.53 up 1 54 1 osd.54 up 1 Regards, Quenten Grasso -----Original Message----- From: Christian Balzer [mailto:chibi@xxxxxxx] Sent: Tuesday, 27 January 2015 11:33 AM To: ceph-users@xxxxxxxxxxxxxx Cc: Quenten Grasso Subject: Re: OSD removal rebalancing again Hello, A "ceph -s" and "ceph osd tree" would have been nice, but my guess is that osd.0 was the only osd on that particular storage server? In that case the removal of the bucket (host) by removing the last OSD in it also triggered a re-balancing. Not really/well documented AFAIK and annoying, but OTOH both expected (from a CRUSH perspective) and harmless. Christian On Tue, 27 Jan 2015 01:21:28 +0000 Quenten Grasso wrote: > Hi All, > > I just removed an OSD from our cluster following the steps on > http://ceph.com/docs/master/rados/operations/add-or-rm-osds/ > > First I set the OSD as out, > > ceph osd out osd.0 > > This emptied the OSD and eventually health of the cluster came back to > normal/ok. and OSD was up and out. (took about 2-3 hours) (OSD.0 used > space before setting as OUT was 900~ GB after rebalance took place OSD > Usage was ~150MB) > > Once this was all ok I then proceeded to STOP the OSD. > > service ceph stop osd.0 > > checked cluster health and all looked ok, then I decided to remove the > osd using the following commands. > > ceph osd crush remove osd.0 > ceph auth del osd.0 > ceph osd rm 0 > > > Now our cluster says > health HEALTH_WARN 414 pgs backfill; 12 pgs backfilling; 19 pgs > recovering; 344 pgs recovery_wait; 789 pgs stuck unclean; recovery > 390967/10986568 objects degraded (3.559%) > > before using the removal procedure everything was "ok" and the osd.0 > had been emptied and seemingly rebalanced. > > Any ideas why its rebalancing again? > > we're using Ubuntu 12.04 w/ Ceph 80.8 & Kernel 3.13.0-43-generic > #72~precise1-Ubuntu SMP Tue Dec 9 12:14:18 UTC 2014 x86_64 x86_64 > x86_64 GNU/Linux > > > > Regards, > Quenten Grasso -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com