Re: Bounding OSD memory requirements during peering/recovery

Gregory Farnum <greg@xxxxxxxxxxx> · Mon, 9 Mar 2015 08:47:53 -0700

On Mon, Mar 9, 2015 at 8:42 AM, Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> Hi Sage,
>
> On Tue, Feb 10, 2015 at 2:51 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
>> On Mon, 9 Feb 2015, David McBride wrote:
>>> On 09/02/15 15:31, Gregory Farnum wrote:
>>>
>>> > So, memory usage of an OSD is usually linear in the number of PGs it
>>> > hosts. However, that memory can also grow based on at least one other
>>> > thing: the number of OSD Maps required to go through peering. It
>>> > *looks* to me like this is what you're running in to, not growth on
>>> > the number of state machines. In particular, those past_intervals you
>>> > mentioned. ;)
>>>
>>> Hi Greg,
>>>
>>> Right, that sounds entirely plausible, and is very helpful.
>>>
>>> In practice, that means I'll need to be careful to avoid this situation
>>> occurring in production ? but given that's unlikely to occur except in the
>>> case of non-trivial neglect, I don't think I need be particularly concerned.
>>>
>>> (Happily, I'm in the situation that my existing cluster is purely for testing
>>> purposes; the data is expendable.)
>>>
>>> That said, for my own peace of mind, it would be valuable to have a procedure
>>> that can be used to recover from this state, even if it's unlikely to occur in
>>> practice.
>>
>> The best luck I've had recovering from situations is something like:
>>
>> - stop all osds
>> - osd set nodown
>> - osd set nobackfill
>> - osd set noup
>> - set map cache size smaller to reduce memory footprint.
>>
>>   osd map cache size = 50
>>   osd map max advance = 25
>>   osd map share max epochs = 25
>>   osd pg epoch persisted max stale = 25

It can cause extreme slowness if you get into a failure situation and
your OSDs need to calculate past intervals across more maps than will
fit in the cache. :(

That said, this might be a good idea as long as you're conscious of
needing to set it back if you get into trouble later on.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html