Re: Unexpected "out" OSD behaviour

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Jonas,

Am 22.12.19 um 23:40 schrieb Jonas Jelten:
> hi!
> 
> I've also noticed that behavior and have submitted a patch some time ago that should fix (2):
> https://github.com/ceph/ceph/pull/27288

thanks, this does indeed seem very much like the issue I saw! 
I'm luckily not in a critical situation at the moment, but was just wondering if this behaviour was normal (since it does not fit well
with the goal of ensuring maximum possible redundancy at all times). 

However, I observed this on 13.2.6, which - if I read the release notes correctly - should already have your patch in. Strange. 

> But it may well be that there's more cases where PGs are not discovered on devices that do have them. Just recently a
> lot of my data was degraded and then recreated even though it would have been available on a node that had taken very
> long to reboot.

We've set "mon_osd_down_out_subtree_limit" to "host" to make sure recovery of data from full hosts does not start without one of us admins
telling Ceph to go ahead. Maybe this also helps in your case? 

> What you can do also is to mark your OSD in and then out right away, the data is discovered then. Although with my patch
> that shouldn't be necessary any more. Hope this helps you.

I will keep this in mind the next time it happens (I may be able to provoke it, we have to drain more nodes, and once the next node is almost-empty,
I can just restart one of the "out" OSDs and see what happens). 

Cheers and many thanks,
	Oliver

> 
> Cheers
>   -- Jonas
> 
> 
> On 22/12/2019 19.48, Oliver Freyermuth wrote:
>> Dear Cephers,
>>
>> I realized the following behaviour only recently:
>>
>> 1. Marking an OSD "out" sets the weight to zero and allows to migrate data away (as long as it is up),
>>    i.e. it is still considered as a "source" and nothing goes to degraded state (so far, everything expected). 
>> 2. Restarting an "out" OSD, however, means it will come back with "0 pgs", and if data was not fully migrated away yet,
>>    it means the PGs which were still kept on it before will enter degraded state since they now lack a copy / shard.
>>
>> Is (2) expected? 
>>
>> If so, my understanding that taking an OSD "out" to let the data be migrated away without losing any redundancy is wrong,
>> since redundancy will be lost as soon as the "out" OSD is restarted (e.g. due to a crash, node reboot,...) and the only safe options would be:
>> 1. Disable the automatic balancer. 
>> 2. Either adjust the weights of the OSDs to drain to zero, or use pg upmap to drain them. 
>> 3. Reenable the automatic balancer only after having fully drained those OSDs and performing the necessary intervention
>>    (in our case, recreating the OSDs with a faster blockdb). 
>>
>> Is this correct? 
>>
>> Cheers,
>> 	Oliver
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> 

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux