Re: backfilling after OSD marked out _and_ OSD removed

Gregory Farnum <greg@xxxxxxxxxxx> · Thu, 9 Jan 2014 11:03:30 -0800

Yep!

On Thu, Jan 9, 2014 at 11:01 AM, Dan Van Der Ster
<daniel.vanderster@xxxxxxx> wrote:
> Thanks Greg. One thought I had is that I might try just crush rm'ing the OSD
> instead of or just after marking it out... That should avoid the double
> rebalance, right?
>
> Cheers, Dan
>
> On Jan 9, 2014 7:57 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
> On Thu, Jan 9, 2014 at 6:27 AM, Dan Van Der Ster
> <daniel.vanderster@xxxxxxx> wrote:
>> Here’s a more direct question. Given this osd tree:
>>
>> # ceph osd tree  |head
>> # id    weight  type name       up/down reweight
>> -1      2952    root default
>> -2      2952            room 0513-R-0050
>> -3      262.1                   rack RJ35
>> ...
>> -14     135.8                   rack RJ57
>> -51     0                               host p05151113781242
>> -52     5.46                            host p05151113782262
>> 1036    2.73                                    osd.1036        DNE
>> 1037    2.73                                    osd.1037        DNE
>> ...
>>
>>
>> If I do
>>
>>    ceph osd crush rm osd.1036
>>
>> or even
>>
>>   ceph osd crush reweight osd.1036 2.5
>>
>> it is going to result in some backfilling. Why?
>
> Yeah, this (and the more specific one you saw with removing OSDs) is
> just an unfortunate consequence of CRUSH's hierarchical weights. When
> you reweight or remove an OSD, you are changing the weight of the
> buckets which contain (it, host, rack, room, etc). That slightly
> changes the calculated data placements. Marking an OSD out does not
> change the containing bucket weights.
>
> We could change that, but it has a bunch of fiddly consequences
> elsewhere (removing OSDs becomes a less local recovery consequence; if
> you replace a drive you still have to go through non-local recovery,
> etc) and we haven't yet come up with an UX that we actually like
> around this work flow, so the existing behavior wins by default.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
>
>>
>> Cheers, Dan
>>
>> On 09 Jan 2014, at 12:11, Dan van der Ster <daniel.vanderster@xxxxxxx>
>> wrote:
>>
>>> Hi,
>>> I’m slightly confused about one thing we are observing at the moment.
>>> We’re testing the shutdown/removal of OSD servers and noticed twice as much
>>> backfilling as expected. This is what we did:
>>>
>>> 1. service ceph stop on some OSD servers.
>>> 2. ceph osd out for the above OSDs (to avoid waiting for the down to out
>>> timeout)
>>> — at this point, backfilling begins and finishes successfully after some
>>> time.
>>> 3. ceph osd rm all of the above OSDs (leaves OSDs in the crush table,
>>> marked DNE)
>>> 4. ceph osd crush rm for each of the above OSDs
>>> — step 4 triggers another rebalancing!! despite there not being any data
>>> on those OSDs and all PGs being previously healthy.
>>>
>>> Is this expected? Is there a way to avoid the 2nd rebalance?
>>>
>>> Best Regards,
>>> Dan van der Ster
>>> CERN IT
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com