Re: Ceph 16.2.14: ceph-mgr getting oom-killed

Zakhar Kirpichenko <zakhar@xxxxxxxxx> · Mon, 18 Dec 2023 16:50:12 +0200

Hi,

Today after 3 weeks of normal operation the mgr reached memory usage of
1600 MB, quickly ballooned to over 100 GB for no apparent reason and got
oom-killed again. There were no suspicious messages in the logs until the
message indicating that the mgr failed to allocate more memory. Any
thoughts?

/Z

On Mon, 11 Dec 2023 at 12:34, Zakhar Kirpichenko <zakhar@xxxxxxxxx> wrote:

> Hi,
>
> Another update: after 2 more weeks the mgr process grew to ~1.5 GB, which
> again was expected:
>
> mgr.ceph01.vankui     ceph01  *:8443,9283  running (2w)    102s ago   2y
>  1519M        -  16.2.14  fc0182d6cda5  3451f8c6c07e
> mgr.ceph02.shsinf     ceph02  *:8443,9283  running (2w)    102s ago   7M
>   112M        -  16.2.14  fc0182d6cda5  1c3d2d83b6df
>
> The cluster is healthy and operating normally, the mgr process is growing
> slowly. It's still unclear what caused the ballooning and OOM issue under
> very similar conditions.
>
> /Z
>
> On Sat, 25 Nov 2023 at 08:31, Zakhar Kirpichenko <zakhar@xxxxxxxxx> wrote:
>
>> Hi,
>>
>> A small update: after disabling 'progress' module the active mgr (on
>> ceph01) used up ~1.3 GB of memory in 3 days, which was expected:
>>
>> mgr.ceph01.vankui     ceph01  *:8443,9283  running (3d)      9m ago   2y
>>    1284M        -  16.2.14  fc0182d6cda5  3451f8c6c07e
>> mgr.ceph02.shsinf     ceph02  *:8443,9283  running (3d)      9m ago   7M
>>     374M        -  16.2.14  fc0182d6cda5  1c3d2d83b6df
>>
>> The cluster is healthy and operating normally. The mgr process is growing
>> slowly, at roughly about 1-2 MB per 10 minutes give or take, which is not
>> quick enough to balloon to over 100 GB RSS over several days, which likely
>> means that whatever triggers the issue happens randomly and quite suddenly.
>> I'll continue monitoring the mgr and get back with more observations.
>>
>> /Z
>>
>> On Wed, 22 Nov 2023 at 16:33, Zakhar Kirpichenko <zakhar@xxxxxxxxx>
>> wrote:
>>
>>> Thanks for this. This looks similar to what we're observing. Although we
>>> don't use the API apart from the usage by Ceph deployment itself - which I
>>> guess still counts.
>>>
>>> /Z
>>>
>>> On Wed, 22 Nov 2023, 15:22 Adrien Georget, <adrien.georget@xxxxxxxxxxx>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> This memory leak with ceph-mgr seems to be due to a change in Ceph
>>>> 16.2.12.
>>>> Check this issue : https://tracker.ceph.com/issues/59580
>>>> We are also affected by this, with or without containerized services.
>>>>
>>>> Cheers,
>>>> Adrien
>>>>
>>>> Le 22/11/2023 à 14:14, Eugen Block a écrit :
>>>> > One other difference is you use docker, right? We use podman, could
>>>> it
>>>> > be some docker restriction?
>>>> >
>>>> > Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx>:
>>>> >
>>>> >> It's a 6-node cluster with 96 OSDs, not much I/O, mgr . Each node
>>>> has
>>>> >> 384
>>>> >> GB of RAM, each OSD has a memory target of 16 GB, about 100 GB of
>>>> >> memory,
>>>> >> give or take, is available (mostly used by page cache) on each node
>>>> >> during
>>>> >> normal operation. Nothing unusual there, tbh.
>>>> >>
>>>> >> No unusual mgr modules or settings either, except for disabled
>>>> progress:
>>>> >>
>>>> >> {
>>>> >>     "always_on_modules": [
>>>> >>         "balancer",
>>>> >>         "crash",
>>>> >>         "devicehealth",
>>>> >>         "orchestrator",
>>>> >>         "pg_autoscaler",
>>>> >>         "progress",
>>>> >>         "rbd_support",
>>>> >>         "status",
>>>> >>         "telemetry",
>>>> >>         "volumes"
>>>> >>     ],
>>>> >>     "enabled_modules": [
>>>> >>         "cephadm",
>>>> >>         "dashboard",
>>>> >>         "iostat",
>>>> >>         "prometheus",
>>>> >>         "restful"
>>>> >>     ],
>>>> >>
>>>> >> /Z
>>>> >>
>>>> >> On Wed, 22 Nov 2023, 14:52 Eugen Block, <eblock@xxxxxx> wrote:
>>>> >>
>>>> >>> What does your hardware look like memory-wise? Just for comparison,
>>>> >>> one customer cluster has 4,5 GB in use (middle-sized cluster for
>>>> >>> openstack, 280 OSDs):
>>>> >>>
>>>> >>>      PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM
>>>> TIME+
>>>> >>> COMMAND
>>>> >>>     6077 ceph      20   0 6357560 4,522g  22316 S 12,00 1,797
>>>> >>> 57022:54 ceph-mgr
>>>> >>>
>>>> >>> In our own cluster (smaller than that and not really heavily used)
>>>> the
>>>> >>> mgr uses almost 2 GB. So those numbers you have seem relatively
>>>> small.
>>>> >>>
>>>> >>> Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx>:
>>>> >>>
>>>> >>> > I've disabled the progress module entirely and will see how it
>>>> goes.
>>>> >>> > Otherwise, mgr memory usage keeps increasing slowly, from past
>>>> >>> experience
>>>> >>> > it will stabilize at around 1.5-1.6 GB. Other than this event
>>>> >>> warning,
>>>> >>> it's
>>>> >>> > unclear what could have caused random memory ballooning.
>>>> >>> >
>>>> >>> > /Z
>>>> >>> >
>>>> >>> > On Wed, 22 Nov 2023 at 13:07, Eugen Block <eblock@xxxxxx> wrote:
>>>> >>> >
>>>> >>> >> I see these progress messages all the time, I don't think they
>>>> cause
>>>> >>> >> it, but I might be wrong. You can disable it just to rule that
>>>> out.
>>>> >>> >>
>>>> >>> >> Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx>:
>>>> >>> >>
>>>> >>> >> > Unfortunately, I don't have a full stack trace because there's
>>>> no
>>>> >>> crash
>>>> >>> >> > when the mgr gets oom-killed. There's just the mgr log, which
>>>> >>> looks
>>>> >>> >> > completely normal until about 2-3 minutes before the oom-kill,
>>>> >>> when
>>>> >>> >> > tmalloc warnings show up.
>>>> >>> >> >
>>>> >>> >> > I'm not sure that it's the same issue that is described in the
>>>> >>> tracker.
>>>> >>> >> We
>>>> >>> >> > seem to have some stale "events" in the progress module though:
>>>> >>> >> >
>>>> >>> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
>>>> >>> 2023-11-21T14:56:30.718+0000
>>>> >>> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>>>> >>> >> > cacc4230-75ee-4892-b8fd-a19fec8f9f66 does not exist
>>>> >>> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
>>>> >>> 2023-11-21T14:56:30.718+0000
>>>> >>> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>>>> >>> >> > 44824331-3f6b-45c4-b925-423d098c3c76 does not exist
>>>> >>> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
>>>> >>> 2023-11-21T14:56:30.718+0000
>>>> >>> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>>>> >>> >> > 0139bc54-ae42-4483-b278-851d77f23f9f does not exist
>>>> >>> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
>>>> >>> 2023-11-21T14:56:30.718+0000
>>>> >>> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>>>> >>> >> > f9d6c20e-b8d8-4625-b9cf-84da1244c822 does not exist
>>>> >>> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
>>>> >>> 2023-11-21T14:56:30.718+0000
>>>> >>> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>>>> >>> >> > 1486b26d-2a23-4416-a864-2cbb0ecf1429 does not exist
>>>> >>> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
>>>> >>> 2023-11-21T14:56:30.718+0000
>>>> >>> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>>>> >>> >> > 7f14d01c-498c-413f-b2ef-05521050190a does not exist
>>>> >>> >> > Nov 21 14:57:35 ceph01 bash[3941523]: debug
>>>> >>> 2023-11-21T14:57:35.950+0000
>>>> >>> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>>>> >>> >> > 48cbd97f-82f7-4b80-8086-890fff6e0824 does not exist
>>>> >>> >> >
>>>> >>> >> > I tried clearing them but they keep showing up. I am wondering
>>>> if
>>>> >>> these
>>>> >>> >> > missing events can cause memory leaks over time.
>>>> >>> >> >
>>>> >>> >> > /Z
>>>> >>> >> >
>>>> >>> >> > On Wed, 22 Nov 2023 at 11:12, Eugen Block <eblock@xxxxxx>
>>>> wrote:
>>>> >>> >> >
>>>> >>> >> >> Do you have the full stack trace? The pastebin only contains
>>>> the
>>>> >>> >> >> "tcmalloc: large alloc" messages (same as in the tracker
>>>> issue).
>>>> >>> Maybe
>>>> >>> >> >> comment in the tracker issue directly since Radek asked for
>>>> >>> someone
>>>> >>> >> >> with a similar problem in a newer release.
>>>> >>> >> >>
>>>> >>> >> >> Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx>:
>>>> >>> >> >>
>>>> >>> >> >> > Thanks, Eugen. It is similar in the sense that the mgr is
>>>> >>> getting
>>>> >>> >> >> > OOM-killed.
>>>> >>> >> >> >
>>>> >>> >> >> > It started happening in our cluster after the upgrade to
>>>> >>> 16.2.14.
>>>> >>> We
>>>> >>> >> >> > haven't had this issue with earlier Pacific releases.
>>>> >>> >> >> >
>>>> >>> >> >> > /Z
>>>> >>> >> >> >
>>>> >>> >> >> > On Tue, 21 Nov 2023, 21:53 Eugen Block, <eblock@xxxxxx>
>>>> wrote:
>>>> >>> >> >> >
>>>> >>> >> >> >> Just checking it on the phone, but isn’t this quite
>>>> similar?
>>>> >>> >> >> >>
>>>> >>> >> >> >> https://tracker.ceph.com/issues/45136
>>>> >>> >> >> >>
>>>> >>> >> >> >> Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx>:
>>>> >>> >> >> >>
>>>> >>> >> >> >> > Hi,
>>>> >>> >> >> >> >
>>>> >>> >> >> >> > I'm facing a rather new issue with our Ceph cluster:
>>>> from
>>>> >>> time
>>>> >>> to
>>>> >>> >> time
>>>> >>> >> >> >> > ceph-mgr on one of the two mgr nodes gets oom-killed
>>>> after
>>>> >>> >> consuming
>>>> >>> >> >> over
>>>> >>> >> >> >> > 100 GB RAM:
>>>> >>> >> >> >> >
>>>> >>> >> >> >> > [Nov21 15:02] tp_osd_tp invoked oom-killer:
>>>> >>> >> >> >> > gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0,
>>>> >>> oom_score_adj=0
>>>> >>> >> >> >> > [  +0.000010] oom_kill_process.cold+0xb/0x10
>>>> >>> >> >> >> > [  +0.000002] [  pid  ]   uid tgid total_vm      rss
>>>> >>> >> pgtables_bytes
>>>> >>> >> >> >> > swapents oom_score_adj name
>>>> >>> >> >> >> > [  +0.000008]
>>>> >>> >> >> >> >
>>>> >>> >> >> >>
>>>> >>> >> >>
>>>> >>> >>
>>>> >>>
>>>> oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=504d37b566d9fd442d45904a00584b4f61c93c5d49dc59eb1c948b3d1c096907,mems_allowed=0-1,global_oom,task_memcg=/docker/3826be8f9115479117ddb8b721ca57585b2bdd58a27c7ed7b38e8d83eb795957,task=ceph-mgr,pid=3941610,uid=167
>>>>
>>>> >>>
>>>> >>> >> >> >> > [  +0.000697] Out of memory: Killed process 3941610
>>>> >>> (ceph-mgr)
>>>> >>> >> >> >> > total-vm:146986656kB, anon-rss:125340436kB, file-rss:0kB,
>>>> >>> >> >> shmem-rss:0kB,
>>>> >>> >> >> >> > UID:167 pgtables:260356kB oom_score_adj:0
>>>> >>> >> >> >> > [  +6.509769] oom_reaper: reaped process 3941610
>>>> >>> (ceph-mgr), now
>>>> >>> >> >> >> > anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
>>>> >>> >> >> >> >
>>>> >>> >> >> >> > The cluster is stable and operating normally, there's
>>>> >>> nothing
>>>> >>> >> unusual
>>>> >>> >> >> >> going
>>>> >>> >> >> >> > on before, during or after the kill, thus it's unclear
>>>> what
>>>> >>> causes
>>>> >>> >> the
>>>> >>> >> >> >> mgr
>>>> >>> >> >> >> > to balloon, use all RAM and get killed. Systemd logs
>>>> >>> aren't very
>>>> >>> >> >> helpful:
>>>> >>> >> >> >> > they just show normal mgr operations until it fails to
>>>> >>> allocate
>>>> >>> >> memory
>>>> >>> >> >> >> and
>>>> >>> >> >> >> > gets killed: https://pastebin.com/MLyw9iVi
>>>> >>> >> >> >> >
>>>> >>> >> >> >> > The mgr experienced this issue several times in the last
>>>> 2
>>>> >>> months,
>>>> >>> >> and
>>>> >>> >> >> >> the
>>>> >>> >> >> >> > events don't appear to correlate with any other events
>>>> in
>>>> >>> the
>>>> >>> >> cluster
>>>> >>> >> >> >> > because basically nothing else happened at around those
>>>> >>> times.
>>>> >>> How
>>>> >>> >> >> can I
>>>> >>> >> >> >> > investigate this and figure out what's causing the mgr to
>>>> >>> consume
>>>> >>> >> all
>>>> >>> >> >> >> > memory and get killed?
>>>> >>> >> >> >> >
>>>> >>> >> >> >> > I would very much appreciate any advice!
>>>> >>> >> >> >> >
>>>> >>> >> >> >> > Best regards,
>>>> >>> >> >> >> > Zakhar
>>>> >>> >> >> >> > _______________________________________________
>>>> >>> >> >> >> > ceph-users mailing list -- ceph-users@xxxxxxx
>>>> >>> >> >> >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>> >>> >> >> >>
>>>> >>> >> >> >>
>>>> >>> >> >> >> _______________________________________________
>>>> >>> >> >> >> ceph-users mailing list -- ceph-users@xxxxxxx
>>>> >>> >> >> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>> >>> >> >> >>
>>>> >>> >> >>
>>>> >>> >> >>
>>>> >>> >> >>
>>>> >>> >> >>
>>>> >>> >>
>>>> >>> >>
>>>> >>> >>
>>>> >>> >>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > ceph-users mailing list -- ceph-users@xxxxxxx
>>>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>>
>>>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx