Re: Memory leak in MGR after upgrading to pacific.

Gary Molenkamp <molenkam@xxxxxx> · Tue, 2 May 2023 08:39:04 -0400

To follow up on this issue,  I saw the additional comments on 
https://tracker.ceph.com/issues/59580 regarding mgr caps.
By setting the mgr user caps back to the default, I was able to reduce 
the memory leak from several 100MB/h to just a few MB/hr.

As the other commenter had posted, in order for zabbix to access OSD 
data via RESTful, the mgr caps were set to:
     ceph auth caps mgr.controller04.lvhgea mon 'allow *' osd 'allow *' 
mds 'allow *'

Gary

On 2023-04-27 08:38, Gary Molenkamp wrote:
Good morning,

After upgrading from Octopus (15.2.17) to Pacific (16.2.12) two days 
ago, I'm noticing that the MGR daemons keep failing over to standby 
and then back every 24hrs.   Watching the output of 'ceph orch ps' I 
can see that the memory consumption of the mgr is steadily growing 
until it becomes unresponsive.

When the mgr becomes unresponsive, tasks such as RESTful calls start 
to fail, and the standby eventually takes over after ~20 minutes. I've 
included a log of memory consumption (in 10 minute intervals) at the 
end of this message. While the cluster recovers during this issue, the 
loss of usage data during the outage, and the fact its occurring is 
problematic.  Any assistance would be appreciated.

Note, this is a cluster that has been upgraded from an original jewel 
based ceph using filestore, through bluestore conversion, container 
conversion, and now to Pacific.    The data below shows memory use 
with three mgr modules enabled:  cephadm, restful, iostat.   By 
disabling iostat, I can reduce the rate of memory consumption 
increasing to about 200MB/hr.

Thanks
Gary.

--
Gary Molenkamp			Science Technology Services
Systems Administrator		University of Western Ontario
molenkam@xxxxxx                 http://sts.sci.uwo.ca
(519) 661-2111 x86882		(519) 661-3566
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx