On Thu, Jan 9, 2014 at 6:27 AM, Dan Van Der Ster <daniel.vanderster@xxxxxxx> wrote: > Here’s a more direct question. Given this osd tree: > > # ceph osd tree |head > # id weight type name up/down reweight > -1 2952 root default > -2 2952 room 0513-R-0050 > -3 262.1 rack RJ35 > ... > -14 135.8 rack RJ57 > -51 0 host p05151113781242 > -52 5.46 host p05151113782262 > 1036 2.73 osd.1036 DNE > 1037 2.73 osd.1037 DNE > ... > > > If I do > > ceph osd crush rm osd.1036 > > or even > > ceph osd crush reweight osd.1036 2.5 > > it is going to result in some backfilling. Why? Yeah, this (and the more specific one you saw with removing OSDs) is just an unfortunate consequence of CRUSH's hierarchical weights. When you reweight or remove an OSD, you are changing the weight of the buckets which contain (it, host, rack, room, etc). That slightly changes the calculated data placements. Marking an OSD out does not change the containing bucket weights. We could change that, but it has a bunch of fiddly consequences elsewhere (removing OSDs becomes a less local recovery consequence; if you replace a drive you still have to go through non-local recovery, etc) and we haven't yet come up with an UX that we actually like around this work flow, so the existing behavior wins by default. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com > > Cheers, Dan > > On 09 Jan 2014, at 12:11, Dan van der Ster <daniel.vanderster@xxxxxxx> wrote: > >> Hi, >> I’m slightly confused about one thing we are observing at the moment. We’re testing the shutdown/removal of OSD servers and noticed twice as much backfilling as expected. This is what we did: >> >> 1. service ceph stop on some OSD servers. >> 2. ceph osd out for the above OSDs (to avoid waiting for the down to out timeout) >> — at this point, backfilling begins and finishes successfully after some time. >> 3. ceph osd rm all of the above OSDs (leaves OSDs in the crush table, marked DNE) >> 4. ceph osd crush rm for each of the above OSDs >> — step 4 triggers another rebalancing!! despite there not being any data on those OSDs and all PGs being previously healthy. >> >> Is this expected? Is there a way to avoid the 2nd rebalance? >> >> Best Regards, >> Dan van der Ster >> CERN IT > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com