Re: backfilling after OSD marked out _and_ OSD removed

Dan Van Der Ster <daniel.vanderster@xxxxxxx> · Thu, 9 Jan 2014 19:01:27 +0000

Thanks Greg. One thought I had is that I might try just crush rm'ing the OSD instead of or just after marking it out... That should avoid the double rebalance, right?
Cheers, Dan
On Jan 9, 2014 7:57 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:

On Thu, Jan 9, 2014 at 6:27 AM, Dan Van Der Ster

<daniel.vanderster@xxxxxxx> wrote:

> Here’s a more direct question. Given this osd tree:

>

> # ceph osd tree  |head

> # id    weight  type name       up/down reweight

> -1      2952    root default

> -2      2952            room 0513-R-0050

> -3      262.1                   rack RJ35

> ...

> -14     135.8                   rack RJ57

> -51     0                               host p05151113781242

> -52     5.46                            host p05151113782262

> 1036    2.73                                    osd.1036        DNE

> 1037    2.73                                    osd.1037        DNE

> ...

>

>

> If I do

>

>    ceph osd crush rm osd.1036

>

> or even

>

>   ceph osd crush reweight osd.1036 2.5

>

> it is going to result in some backfilling. Why?

Yeah, this (and the more specific one you saw with removing OSDs) is

just an unfortunate consequence of CRUSH's hierarchical weights. When

you reweight or remove an OSD, you are changing the weight of the

buckets which contain (it, host, rack, room, etc). That slightly

changes the calculated data placements. Marking an OSD out does not

change the containing bucket weights.

We could change that, but it has a bunch of fiddly consequences

elsewhere (removing OSDs becomes a less local recovery consequence; if

you replace a drive you still have to go through non-local recovery,

etc) and we haven't yet come up with an UX that we actually like

around this work flow, so the existing behavior wins by default.

-Greg

Software Engineer #42 @ http://inktank.com | 
http://ceph.com

>

> Cheers, Dan

>

> On 09 Jan 2014, at 12:11, Dan van der Ster <daniel.vanderster@xxxxxxx> wrote:

>

>> Hi,

>> I’m slightly confused about one thing we are observing at the moment. We’re testing the shutdown/removal of OSD servers and noticed twice as much backfilling as expected. This is what we did:

>>

>> 1. service ceph stop on some OSD servers.

>> 2. ceph osd out for the above OSDs (to avoid waiting for the down to out timeout)

>> — at this point, backfilling begins and finishes successfully after some time.

>> 3. ceph osd rm all of the above OSDs (leaves OSDs in the crush table, marked DNE)

>> 4. ceph osd crush rm for each of the above OSDs

>> — step 4 triggers another rebalancing!! despite there not being any data on those OSDs and all PGs being previously healthy.

>>

>> Is this expected? Is there a way to avoid the 2nd rebalance?

>>

>> Best Regards,

>> Dan van der Ster

>> CERN IT

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com