Re: ceph_leadership_team_meeting_s18e06.mkv

Loïc Tortay <tortay@xxxxxxxxxxx> · Fri, 8 Sep 2023 11:00:31 +0200

On 07/09/2023 21:33, Mark Nelson wrote:
Hi Rok,

We're still try to catch what's causing the memory growth, so it's hard 
to guess at which releases are affected.  We know it's happening 
intermittently on a live Pacific cluster at least.  If you have the 
ability to catch it while it's happening, there are several 
approaches/tools that might aid in diagnosing it. Container deployments 
are a bit tougher to get debugging tools working in though which afaik 
has slowed down existing attempts at diagnosing the issue.

Hello,
We have a cluster recently upgraded from Octopus to Pacific 16.2.13 
where the active MGR was OOM-killed a few times.

We have another cluster that was recently upgraded from 16.2.11 to 
16.2.14 and the issue also started to appear (very soon) on that cluster.
We didn't have the issue before, during the months running 16.2.11.

In short: the issue seems to be due to a change in 16.2.12 or 16.2.13.

Loïc.
--
|       Loīc Tortay <tortay@xxxxxxxxxxx> - IN2P3 Computing Centre      |
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx