Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

Adam King <adking@xxxxxxxxxx> · Wed, 27 Mar 2024 21:13:27 -0400

 I missed a step in the calculation. The total_memory_kb I mentioned
earlier is also multiplied by the value of the
mgr/cephadm/autotune_memory_target_ratio before doing the subtractions for
all the daemons. That value defaults to 0.7. That might explain it seeming
like it's getting a value lower than expected. Beyond that, I'd think 'i'd
need a list of the daemon types and count on that host to try and work
through what it's doing.

On Wed, Mar 27, 2024 at 10:47 AM Mads Aasted <mads2a@xxxxxxxxx> wrote:

> Hi Adam.
>
> So doing the calculations with what you are stating here I arrive at a
> total sum for all the listed processes at 13.3 (roughly) gb, for everything
> except the osds, leaving well in excess of +4gb for each OSD.
> Besides the mon daemon which i can tell on my host has a limit of 2gb ,
> none of the other daemons seem to have a limit set according to ceph orch
> ps. Then again, they are nowhere near the values stated in min_size_by_type
> that you list.
> Obviously yes, I could disable the auto tuning, but that would leave me
> none the wiser as to why this exact host is trying to do this.
>
>
>
> On Tue, Mar 26, 2024 at 10:20 PM Adam King <adking@xxxxxxxxxx> wrote:
>
>> For context, the value the autotune goes with takes the value from
>> `cephadm gather-facts` on the host (the "memory_total_kb" field) and then
>> subtracts from that per daemon on the host according to
>>
>>     min_size_by_type = {
>>         'mds': 4096 * 1048576,
>>         'mgr': 4096 * 1048576,
>>         'mon': 1024 * 1048576,
>>         'crash': 128 * 1048576,
>>         'keepalived': 128 * 1048576,
>>         'haproxy': 128 * 1048576,
>>         'nvmeof': 4096 * 1048576,
>>     }
>>     default_size = 1024 * 1048576
>>
>> what's left is then divided by the number of OSDs on the host to arrive
>> at the value. I'll also add, since it seems to be an issue on this
>> particular host,  if you add the "_no_autotune_memory" label to the host,
>> it will stop trying to do this on that host.
>>
>> On Mon, Mar 25, 2024 at 6:32 PM <mads2a@xxxxxxxxx> wrote:
>>
>>> I have a virtual ceph cluster running 17.2.6 with 4 ubuntu 22.04 hosts
>>> in it, each with 4 OSD's attached. The first 2 servers hosting mgr's have
>>> 32GB of RAM each, and the remaining have 24gb
>>> For some reason i am unable to identify, the first host in the cluster
>>> appears to constantly be trying to set the osd_memory_target variable to
>>> roughly half of what the calculated minimum is for the cluster, i see the
>>> following spamming the logs constantly
>>> Unable to set osd_memory_target on my-ceph01 to 480485376: error parsing
>>> value: Value '480485376' is below minimum 939524096
>>> Default is set to 4294967296.
>>> I did double check and osd_memory_base (805306368) +
>>> osd_memory_cache_min (134217728) adds up to minimum exactly
>>> osd_memory_target_autotune is currently enabled. But i cannot for the
>>> life of me figure out how it is arriving at 480485376 as a value for that
>>> particular host that even has the most RAM. Neither the cluster or the host
>>> is even approaching max utilization on memory, so it's not like there are
>>> processes competing for resources.
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>
>>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx