Re: MGR Memory Leak in Restful

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



I do not believe this is actively being worked on, but there is a tracker open, if you can submit an update it may help attract attention/develop a fix:


On Fri, Sep 8, 2023, at 03:29, Chris Palmer wrote:
> I first posted this on 17 April but did not get any response (although 
> IIRC a number of other posts referred to it).
> Seeing as MGR OOM is being discussed at the moment I am re-posting.
> These clusters are not containerized.
> Is this being tracked/fixed or not?
> Thanks, Chris
> -------------------------------
> We've hit a memory leak in the Manager Restful interface, in versions 
> 17.2.5 & 17.2.6. On our main production cluster the active MGR grew to 
> about 60G until the oom_reaper killed it, causing a successful failover 
> and restart of the failed one. We can then see that the problem is 
> recurring, actually on all 3 of our clusters.
> We've traced this to when we enabled full Ceph monitoring by Zabbix last 
> week. The leak is about 20GB per day, and seems to be proportional to 
> the number of PGs. For some time we just had the default settings, and 
> no memory leak, but had not got around to finding why many of the Zabbix 
> items were showing as Access Denied. We traced this to the MGR's MON 
> CAPS which were "mon 'profile mgr'".
> The MON logs showed recurring:
> log_channel(audit) log [DBG] : from='mgr.284576436 
>' entity='mgr.host1' cmd=[{"format": "json", 
> "prefix": "pg dump"}]:  access denied
> Changing the MGR CAPS to "mon 'allow *'" and restarting the MGR 
> immediately allowed that to work, and all the follow-on REST calls worked.
> log_channel(audit) log [DBG] : from='mgr.283590200 
>' entity='mgr.host1' cmd=[{"format": "json", 
> "prefix": "pg dump"}]: dispatch
> However it has also caused the memory leak to start.
> We've reverted the CAPS and are back to how we were.
> Two questions:
> 1) No matter what the REST consumer is doing, the MGR should not 
> accumulate memory, especially as we can see that the REST TCP 
> connections have wrapped up. Is there anything more we can do to 
> diagnose this?
> 2) Setting "allow *" worked, but is there are better setting just to 
> allow the "pg dump" call (in addition to profile mgr)?
> Thanks, Chris
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]

  Powered by Linux