Re: Cephadm host keeps trying to set osd_memory_target to less than minimum

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The same experiment with the mds daemons pulling 4GB instead of the 16GB,
and me fixing the starting total memory (I accidentally used the
memory_available_kb instead of memory_total_kb the first time) gives us























*DEBUG    cephadm.autotune:autotune.py:35 Autotuning OSD memory with given
parameters:Total memory: 23530995712Daemons: [<DaemonDescription>(crash.a),
<DaemonDescription>(grafana.a), <DaemonDescription>(mds.a),
<DaemonDescription>(mds.b), <DaemonDescription>(mds.c),
<DaemonDescription>(mgr.a), <DaemonDescription>(mon.a),
<DaemonDescription>(node-exporter.a), <DaemonDescription>(osd.1),
<DaemonDescription>(osd.2), <DaemonDescription>(osd.3),
<DaemonDescription>(osd.4), <DaemonDescription>(prometheus.a)]DEBUG
 cephadm.autotune:autotune.py:50 Subtracting 134217728 from total for crash
daemonDEBUG    cephadm.autotune:autotune.py:52 new total: 23396777984DEBUG
   cephadm.autotune:autotune.py:50 Subtracting 1073741824 from total for
grafana daemonDEBUG    cephadm.autotune:autotune.py:52 new total:
22323036160DEBUG    cephadm.autotune:autotune.py:40 Subtracting 4294967296
from total for mds daemonDEBUG    cephadm.autotune:autotune.py:42 new
total: 18028068864DEBUG    cephadm.autotune:autotune.py:40 Subtracting
4294967296 from total for mds daemonDEBUG
 cephadm.autotune:autotune.py:42 new total: 13733101568DEBUG
 cephadm.autotune:autotune.py:40 Subtracting 4294967296 from total for mds
daemonDEBUG    cephadm.autotune:autotune.py:42 new total: 9438134272DEBUG
 cephadm.autotune:autotune.py:50 Subtracting 4294967296 from total for mgr
daemonDEBUG    cephadm.autotune:autotune.py:52 new total: 5143166976DEBUG
 cephadm.autotune:autotune.py:50 Subtracting 1073741824 from total for mon
daemonDEBUG    cephadm.autotune:autotune.py:52 new total: 4069425152DEBUG
 cephadm.autotune:autotune.py:50 Subtracting 1073741824 from total for
node-exporter daemonDEBUG    cephadm.autotune:autotune.py:52 new total:
2995683328DEBUG    cephadm.autotune:autotune.py:50 Subtracting 1073741824
from total for prometheus daemonDEBUG    cephadm.autotune:autotune.py:52
new total: 1921941504DEBUG    cephadm.autotune:autotune.py:66 Final total
is 1921941504 to be split among 4 OSDsDEBUG
 cephadm.autotune:autotune.py:68 Result is 480485376 per OSD*

My understanding is, given starting memory_total_kb of *32827840*, we get
*33615708160* total bytes. We multiply that by the 0.7 autotune ratio to
get *23530995712 *bytes to be split among the daemons (something like 23-24
GB). Then the mgr and mds daemons all get 4GB, the mon, node-exporter, and
prometheus all take 1GB, and the crash daemon gets 128KB. That leaves us
with only 2GB to split among the 4 OSDs. That's how we arrive at that
"480485376" number per OSD from the original error message you posted.

Unable to set osd_memory_target on my-ceph01 to 480485376: error parsing
> value: Value '480485376' is below minimum 939524096


As that value is well below the minimum (it's only about half a GB), it
reports that error when trying to set it.

On Tue, Apr 9, 2024 at 12:58 PM Mads Aasted <mads2a@xxxxxxxxx> wrote:

> Hi Adam
>
> Seems like the mds_cache_memory_limit both set globally through cephadm
> and the hosts mds daemons are all set to approx. 4gb
> root@my-ceph01:/# ceph config get mds mds_cache_memory_limit
> 4294967296
> same if query the individual mds daemons running on my-ceph01, or any of
> the other mds daemons on the other hosts.
>
> On Tue, Apr 9, 2024 at 6:14 PM Mads Aasted <mads2a@xxxxxxxxx> wrote:
>
>> Hi Adam
>>
>> Let me just finish tucking in a devlish tyke here and i’ll get to it
>> first thing
>>
>> tirs. 9. apr. 2024 kl. 18.09 skrev Adam King <adking@xxxxxxxxxx>:
>>
>>> I did end up writing a unit test to see what we calculated here, as well
>>> as adding a bunch of debug logging (haven't created a PR yet, but probably
>>> will).  The total memory was set to (19858056 * 1024 * 0.7) (total memory
>>> in bytes * the autotune target ratio) = 14234254540. What ended up getting
>>> logged was (ignore the daemon id for the daemons, they don't affect
>>> anything. Only the types matter)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *DEBUG    cephadm.autotune:autotune.py:35 Autotuning OSD memory with
>>> given parameters:Total memory: 14234254540Daemons:
>>> [<DaemonDescription>(crash.a), <DaemonDescription>(grafana.a),
>>> <DaemonDescription>(mds.a), <DaemonDescription>(mds.b),
>>> <DaemonDescription>(mds.c), <DaemonDescription>(mgr.a),
>>> <DaemonDescription>(mon.a), <DaemonDescription>(node-exporter.a),
>>> <DaemonDescription>(osd.1), <DaemonDescription>(osd.2),
>>> <DaemonDescription>(osd.3), <DaemonDescription>(osd.4),
>>> <DaemonDescription>(prometheus.a)]DEBUG    cephadm.autotune:autotune.py:50
>>> Subtracting 134217728 from total for crash daemonDEBUG
>>>  cephadm.autotune:autotune.py:52 new total: 14100036812DEBUG
>>>  cephadm.autotune:autotune.py:50 Subtracting 1073741824 from total for
>>> grafana daemonDEBUG    cephadm.autotune:autotune.py:52 new total:
>>> 13026294988DEBUG    cephadm.autotune:autotune.py:40 Subtracting 17179869184
>>> from total for mds daemonDEBUG    cephadm.autotune:autotune.py:42 new
>>> total: -4153574196DEBUG    cephadm.autotune:autotune.py:40 Subtracting
>>> 17179869184 from total for mds daemonDEBUG
>>>  cephadm.autotune:autotune.py:42 new total: -21333443380DEBUG
>>>  cephadm.autotune:autotune.py:40 Subtracting 17179869184 from total for mds
>>> daemonDEBUG    cephadm.autotune:autotune.py:42 new total: -38513312564DEBUG
>>>    cephadm.autotune:autotune.py:50 Subtracting 4294967296 from total for
>>> mgr daemonDEBUG    cephadm.autotune:autotune.py:52 new total:
>>> -42808279860DEBUG    cephadm.autotune:autotune.py:50 Subtracting 1073741824
>>> from total for mon daemonDEBUG    cephadm.autotune:autotune.py:52 new
>>> total: -43882021684DEBUG    cephadm.autotune:autotune.py:50 Subtracting
>>> 1073741824 from total for node-exporter daemonDEBUG
>>>  cephadm.autotune:autotune.py:52 new total: -44955763508DEBUG
>>>  cephadm.autotune:autotune.py:50 Subtracting 1073741824 from total for
>>> prometheus daemonDEBUG    cephadm.autotune:autotune.py:52 new total:
>>> -46029505332*
>>>
>>> It looks like it was taking pretty much all the memory away for the mds
>>> daemons. The amount, however, is taken from the "mds_cache_memory_limit"
>>> setting for each mds daemon. The number it was defaulting to for the test
>>> is quite large. I guess I'd need to know what that comes out to for the mds
>>> daemons in your cluster to get a full picture. Also, you can see the total
>>> go well into the negatives here. When that happens cephadm just tries to
>>> remove the osd_memory_target config settings for the OSDs on the host, but
>>> given the error message from your initial post, it must be getting some
>>> positive value when actually running on your system.
>>>
>>> On Fri, Apr 5, 2024 at 2:21 AM Mads Aasted <mads2a@xxxxxxxxx> wrote:
>>>
>>>> Hi Adam
>>>> No problem, i really appreciate your input :)
>>>> The memory stats returned are as follows
>>>>   "memory_available_kb": 19858056,
>>>>   "memory_free_kb": 277480,
>>>>   "memory_total_kb": 32827840,
>>>>
>>>> On Thu, Apr 4, 2024 at 10:14 PM Adam King <adking@xxxxxxxxxx> wrote:
>>>>
>>>>> Sorry to keep asking for more info, but can I also get what `cephadm
>>>>> gather-facts` on that host returns for "memory_total_kb". Might end up
>>>>> creating a unit test out of this case if we have a calculation bug here.
>>>>>
>>>>> On Thu, Apr 4, 2024 at 4:05 PM Mads Aasted <mads2a@xxxxxxxxx> wrote:
>>>>>
>>>>>> sorry for the double send, forgot to hit reply all so it would appear
>>>>>> on the page
>>>>>>
>>>>>> Hi Adam
>>>>>>
>>>>>> If we multiply by 0.7, and work through the previous example from
>>>>>> that number, we would still arrive at roughly 2.5 gb for each osd. And the
>>>>>> host in question is trying to set it to less than 500mb.
>>>>>> I have attached a list of the processes running on the host.
>>>>>> Currently you can even see that the OSD's are taking up the most memory by
>>>>>> far, and at least 5x its proposed minimum.
>>>>>> root@my-ceph01:/# ceph orch ps | grep my-ceph01
>>>>>> crash.my-ceph01               my-ceph01               running (3w)
>>>>>>    7m ago  13M    9052k        -  17.2.6
>>>>>> grafana.my-ceph01             my-ceph01  *:3000       running (3w)
>>>>>>    7m ago  13M    95.6M        -  8.3.5
>>>>>> mds.testfs.my-ceph01.xjxfzd  my-ceph01               running (3w)
>>>>>>  7m ago  10M     485M        -  17.2.6
>>>>>> mds.prodfs.my-ceph01.rplvac   my-ceph01               running (3w)
>>>>>>    7m ago  12M    26.9M        -  17.2.6
>>>>>> mds.prodfs.my-ceph01.twikzd    my-ceph01               running (3w)
>>>>>>    7m ago  12M    26.2M        -  17.2.6
>>>>>> mgr.my-ceph01.rxdefe          my-ceph01  *:8443,9283  running (3w)
>>>>>>    7m ago  13M     907M        -  17.2.6
>>>>>> mon.my-ceph01                 my-ceph01               running (3w)
>>>>>>    7m ago  13M     503M    2048M  17.2.6
>>>>>> node-exporter.my-ceph01       my-ceph01  *:9100       running (3w)
>>>>>>    7m ago  13M    20.4M        -  1.5.0
>>>>>> osd.3                            my-ceph01               running (3w)
>>>>>>      7m ago  11M    2595M    4096M  17.2.6
>>>>>> osd.5                            my-ceph01               running (3w)
>>>>>>      7m ago  11M    2494M    4096M  17.2.6
>>>>>> osd.6                            my-ceph01               running (3w)
>>>>>>      7m ago  11M    2698M    4096M  17.2.6
>>>>>> osd.9                            my-ceph01               running (3w)
>>>>>>      7m ago  11M    3364M    4096M  17.2.6
>>>>>> prometheus.my-ceph01          my-ceph01  *:9095       running (3w)
>>>>>>    7m ago  13M     164M        -  2.42.0
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Mar 28, 2024 at 2:13 AM Adam King <adking@xxxxxxxxxx> wrote:
>>>>>>
>>>>>>>  I missed a step in the calculation. The total_memory_kb I mentioned
>>>>>>> earlier is also multiplied by the value of the
>>>>>>> mgr/cephadm/autotune_memory_target_ratio before doing the subtractions for
>>>>>>> all the daemons. That value defaults to 0.7. That might explain it seeming
>>>>>>> like it's getting a value lower than expected. Beyond that, I'd think 'i'd
>>>>>>> need a list of the daemon types and count on that host to try and work
>>>>>>> through what it's doing.
>>>>>>>
>>>>>>> On Wed, Mar 27, 2024 at 10:47 AM Mads Aasted <mads2a@xxxxxxxxx>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Adam.
>>>>>>>>
>>>>>>>> So doing the calculations with what you are stating here I arrive
>>>>>>>> at a total sum for all the listed processes at 13.3 (roughly) gb, for
>>>>>>>> everything except the osds, leaving well in excess of +4gb for each OSD.
>>>>>>>> Besides the mon daemon which i can tell on my host has a limit of
>>>>>>>> 2gb , none of the other daemons seem to have a limit set according to ceph
>>>>>>>> orch ps. Then again, they are nowhere near the values stated in
>>>>>>>> min_size_by_type that you list.
>>>>>>>> Obviously yes, I could disable the auto tuning, but that would
>>>>>>>> leave me none the wiser as to why this exact host is trying to do this.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Mar 26, 2024 at 10:20 PM Adam King <adking@xxxxxxxxxx>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> For context, the value the autotune goes with takes the value from
>>>>>>>>> `cephadm gather-facts` on the host (the "memory_total_kb" field) and then
>>>>>>>>> subtracts from that per daemon on the host according to
>>>>>>>>>
>>>>>>>>>     min_size_by_type = {
>>>>>>>>>         'mds': 4096 * 1048576,
>>>>>>>>>         'mgr': 4096 * 1048576,
>>>>>>>>>         'mon': 1024 * 1048576,
>>>>>>>>>         'crash': 128 * 1048576,
>>>>>>>>>         'keepalived': 128 * 1048576,
>>>>>>>>>         'haproxy': 128 * 1048576,
>>>>>>>>>         'nvmeof': 4096 * 1048576,
>>>>>>>>>     }
>>>>>>>>>     default_size = 1024 * 1048576
>>>>>>>>>
>>>>>>>>> what's left is then divided by the number of OSDs on the host to
>>>>>>>>> arrive at the value. I'll also add, since it seems to be an issue on this
>>>>>>>>> particular host,  if you add the "_no_autotune_memory" label to the host,
>>>>>>>>> it will stop trying to do this on that host.
>>>>>>>>>
>>>>>>>>> On Mon, Mar 25, 2024 at 6:32 PM <mads2a@xxxxxxxxx> wrote:
>>>>>>>>>
>>>>>>>>>> I have a virtual ceph cluster running 17.2.6 with 4 ubuntu 22.04
>>>>>>>>>> hosts in it, each with 4 OSD's attached. The first 2 servers hosting mgr's
>>>>>>>>>> have 32GB of RAM each, and the remaining have 24gb
>>>>>>>>>> For some reason i am unable to identify, the first host in the
>>>>>>>>>> cluster appears to constantly be trying to set the osd_memory_target
>>>>>>>>>> variable to roughly half of what the calculated minimum is for the cluster,
>>>>>>>>>> i see the following spamming the logs constantly
>>>>>>>>>> Unable to set osd_memory_target on my-ceph01 to 480485376: error
>>>>>>>>>> parsing value: Value '480485376' is below minimum 939524096
>>>>>>>>>> Default is set to 4294967296.
>>>>>>>>>> I did double check and osd_memory_base (805306368) +
>>>>>>>>>> osd_memory_cache_min (134217728) adds up to minimum exactly
>>>>>>>>>> osd_memory_target_autotune is currently enabled. But i cannot for
>>>>>>>>>> the life of me figure out how it is arriving at 480485376 as a value for
>>>>>>>>>> that particular host that even has the most RAM. Neither the cluster or the
>>>>>>>>>> host is even approaching max utilization on memory, so it's not like there
>>>>>>>>>> are processes competing for resources.
>>>>>>>>>> _______________________________________________
>>>>>>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>>>>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>>>>>>>>
>>>>>>>>>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux