Yep! On Thu, Jan 9, 2014 at 11:01 AM, Dan Van Der Ster <daniel.vanderster@xxxxxxx> wrote: > Thanks Greg. One thought I had is that I might try just crush rm'ing the OSD > instead of or just after marking it out... That should avoid the double > rebalance, right? > > Cheers, Dan > > On Jan 9, 2014 7:57 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: > On Thu, Jan 9, 2014 at 6:27 AM, Dan Van Der Ster > <daniel.vanderster@xxxxxxx> wrote: >> Here’s a more direct question. Given this osd tree: >> >> # ceph osd tree |head >> # id weight type name up/down reweight >> -1 2952 root default >> -2 2952 room 0513-R-0050 >> -3 262.1 rack RJ35 >> ... >> -14 135.8 rack RJ57 >> -51 0 host p05151113781242 >> -52 5.46 host p05151113782262 >> 1036 2.73 osd.1036 DNE >> 1037 2.73 osd.1037 DNE >> ... >> >> >> If I do >> >> ceph osd crush rm osd.1036 >> >> or even >> >> ceph osd crush reweight osd.1036 2.5 >> >> it is going to result in some backfilling. Why? > > Yeah, this (and the more specific one you saw with removing OSDs) is > just an unfortunate consequence of CRUSH's hierarchical weights. When > you reweight or remove an OSD, you are changing the weight of the > buckets which contain (it, host, rack, room, etc). That slightly > changes the calculated data placements. Marking an OSD out does not > change the containing bucket weights. > > We could change that, but it has a bunch of fiddly consequences > elsewhere (removing OSDs becomes a less local recovery consequence; if > you replace a drive you still have to go through non-local recovery, > etc) and we haven't yet come up with an UX that we actually like > around this work flow, so the existing behavior wins by default. > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com > > > >> >> Cheers, Dan >> >> On 09 Jan 2014, at 12:11, Dan van der Ster <daniel.vanderster@xxxxxxx> >> wrote: >> >>> Hi, >>> I’m slightly confused about one thing we are observing at the moment. >>> We’re testing the shutdown/removal of OSD servers and noticed twice as much >>> backfilling as expected. This is what we did: >>> >>> 1. service ceph stop on some OSD servers. >>> 2. ceph osd out for the above OSDs (to avoid waiting for the down to out >>> timeout) >>> — at this point, backfilling begins and finishes successfully after some >>> time. >>> 3. ceph osd rm all of the above OSDs (leaves OSDs in the crush table, >>> marked DNE) >>> 4. ceph osd crush rm for each of the above OSDs >>> — step 4 triggers another rebalancing!! despite there not being any data >>> on those OSDs and all PGs being previously healthy. >>> >>> Is this expected? Is there a way to avoid the 2nd rebalance? >>> >>> Best Regards, >>> Dan van der Ster >>> CERN IT >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com