Hi, I do not believe this is actively being worked on, but there is a tracker open, if you can submit an update it may help attract attention/develop a fix: https://tracker.ceph.com/issues/59580 David On Fri, Sep 8, 2023, at 03:29, Chris Palmer wrote: > I first posted this on 17 April but did not get any response (although > IIRC a number of other posts referred to it). > Seeing as MGR OOM is being discussed at the moment I am re-posting. > These clusters are not containerized. > > Is this being tracked/fixed or not? > > Thanks, Chris > > ------------------------------- > > We've hit a memory leak in the Manager Restful interface, in versions > 17.2.5 & 17.2.6. On our main production cluster the active MGR grew to > about 60G until the oom_reaper killed it, causing a successful failover > and restart of the failed one. We can then see that the problem is > recurring, actually on all 3 of our clusters. > > We've traced this to when we enabled full Ceph monitoring by Zabbix last > week. The leak is about 20GB per day, and seems to be proportional to > the number of PGs. For some time we just had the default settings, and > no memory leak, but had not got around to finding why many of the Zabbix > items were showing as Access Denied. We traced this to the MGR's MON > CAPS which were "mon 'profile mgr'". > > The MON logs showed recurring: > > log_channel(audit) log [DBG] : from='mgr.284576436 > 192.168.xxx.xxx:0/2356365' entity='mgr.host1' cmd=[{"format": "json", > "prefix": "pg dump"}]: access denied > > > Changing the MGR CAPS to "mon 'allow *'" and restarting the MGR > immediately allowed that to work, and all the follow-on REST calls worked. > > log_channel(audit) log [DBG] : from='mgr.283590200 > 192.168.xxx.xxx:0/1779' entity='mgr.host1' cmd=[{"format": "json", > "prefix": "pg dump"}]: dispatch > > > However it has also caused the memory leak to start. > > We've reverted the CAPS and are back to how we were. > > Two questions: > 1) No matter what the REST consumer is doing, the MGR should not > accumulate memory, especially as we can see that the REST TCP > connections have wrapped up. Is there anything more we can do to > diagnose this? > 2) Setting "allow *" worked, but is there are better setting just to > allow the "pg dump" call (in addition to profile mgr)? > > Thanks, Chris > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx