Re: Ceph 16.2.14: ceph-mgr getting oom-killed

Zakhar Kirpichenko <zakhar@xxxxxxxxx> · Sat, 25 Nov 2023 08:31:03 +0200

Hi,

A small update: after disabling 'progress' module the active mgr (on
ceph01) used up ~1.3 GB of memory in 3 days, which was expected:

mgr.ceph01.vankui     ceph01  *:8443,9283  running (3d)      9m ago   2y
 1284M        -  16.2.14  fc0182d6cda5  3451f8c6c07e
mgr.ceph02.shsinf     ceph02  *:8443,9283  running (3d)      9m ago   7M
  374M        -  16.2.14  fc0182d6cda5  1c3d2d83b6df

The cluster is healthy and operating normally. The mgr process is growing
slowly, at roughly about 1-2 MB per 10 minutes give or take, which is not
quick enough to balloon to over 100 GB RSS over several days, which likely
means that whatever triggers the issue happens randomly and quite suddenly.
I'll continue monitoring the mgr and get back with more observations.

/Z

On Wed, 22 Nov 2023 at 16:33, Zakhar Kirpichenko <zakhar@xxxxxxxxx> wrote:

> Thanks for this. This looks similar to what we're observing. Although we
> don't use the API apart from the usage by Ceph deployment itself - which I
> guess still counts.
>
> /Z
>
> On Wed, 22 Nov 2023, 15:22 Adrien Georget, <adrien.georget@xxxxxxxxxxx>
> wrote:
>
>> Hi,
>>
>> This memory leak with ceph-mgr seems to be due to a change in Ceph
>> 16.2.12.
>> Check this issue : https://tracker.ceph.com/issues/59580
>> We are also affected by this, with or without containerized services.
>>
>> Cheers,
>> Adrien
>>
>> Le 22/11/2023 à 14:14, Eugen Block a écrit :
>> > One other difference is you use docker, right? We use podman, could it
>> > be some docker restriction?
>> >
>> > Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx>:
>> >
>> >> It's a 6-node cluster with 96 OSDs, not much I/O, mgr . Each node has
>> >> 384
>> >> GB of RAM, each OSD has a memory target of 16 GB, about 100 GB of
>> >> memory,
>> >> give or take, is available (mostly used by page cache) on each node
>> >> during
>> >> normal operation. Nothing unusual there, tbh.
>> >>
>> >> No unusual mgr modules or settings either, except for disabled
>> progress:
>> >>
>> >> {
>> >>     "always_on_modules": [
>> >>         "balancer",
>> >>         "crash",
>> >>         "devicehealth",
>> >>         "orchestrator",
>> >>         "pg_autoscaler",
>> >>         "progress",
>> >>         "rbd_support",
>> >>         "status",
>> >>         "telemetry",
>> >>         "volumes"
>> >>     ],
>> >>     "enabled_modules": [
>> >>         "cephadm",
>> >>         "dashboard",
>> >>         "iostat",
>> >>         "prometheus",
>> >>         "restful"
>> >>     ],
>> >>
>> >> /Z
>> >>
>> >> On Wed, 22 Nov 2023, 14:52 Eugen Block, <eblock@xxxxxx> wrote:
>> >>
>> >>> What does your hardware look like memory-wise? Just for comparison,
>> >>> one customer cluster has 4,5 GB in use (middle-sized cluster for
>> >>> openstack, 280 OSDs):
>> >>>
>> >>>      PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+
>> >>> COMMAND
>> >>>     6077 ceph      20   0 6357560 4,522g  22316 S 12,00 1,797
>> >>> 57022:54 ceph-mgr
>> >>>
>> >>> In our own cluster (smaller than that and not really heavily used) the
>> >>> mgr uses almost 2 GB. So those numbers you have seem relatively small.
>> >>>
>> >>> Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx>:
>> >>>
>> >>> > I've disabled the progress module entirely and will see how it goes.
>> >>> > Otherwise, mgr memory usage keeps increasing slowly, from past
>> >>> experience
>> >>> > it will stabilize at around 1.5-1.6 GB. Other than this event
>> >>> warning,
>> >>> it's
>> >>> > unclear what could have caused random memory ballooning.
>> >>> >
>> >>> > /Z
>> >>> >
>> >>> > On Wed, 22 Nov 2023 at 13:07, Eugen Block <eblock@xxxxxx> wrote:
>> >>> >
>> >>> >> I see these progress messages all the time, I don't think they
>> cause
>> >>> >> it, but I might be wrong. You can disable it just to rule that out.
>> >>> >>
>> >>> >> Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx>:
>> >>> >>
>> >>> >> > Unfortunately, I don't have a full stack trace because there's no
>> >>> crash
>> >>> >> > when the mgr gets oom-killed. There's just the mgr log, which
>> >>> looks
>> >>> >> > completely normal until about 2-3 minutes before the oom-kill,
>> >>> when
>> >>> >> > tmalloc warnings show up.
>> >>> >> >
>> >>> >> > I'm not sure that it's the same issue that is described in the
>> >>> tracker.
>> >>> >> We
>> >>> >> > seem to have some stale "events" in the progress module though:
>> >>> >> >
>> >>> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
>> >>> 2023-11-21T14:56:30.718+0000
>> >>> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>> >>> >> > cacc4230-75ee-4892-b8fd-a19fec8f9f66 does not exist
>> >>> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
>> >>> 2023-11-21T14:56:30.718+0000
>> >>> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>> >>> >> > 44824331-3f6b-45c4-b925-423d098c3c76 does not exist
>> >>> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
>> >>> 2023-11-21T14:56:30.718+0000
>> >>> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>> >>> >> > 0139bc54-ae42-4483-b278-851d77f23f9f does not exist
>> >>> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
>> >>> 2023-11-21T14:56:30.718+0000
>> >>> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>> >>> >> > f9d6c20e-b8d8-4625-b9cf-84da1244c822 does not exist
>> >>> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
>> >>> 2023-11-21T14:56:30.718+0000
>> >>> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>> >>> >> > 1486b26d-2a23-4416-a864-2cbb0ecf1429 does not exist
>> >>> >> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
>> >>> 2023-11-21T14:56:30.718+0000
>> >>> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>> >>> >> > 7f14d01c-498c-413f-b2ef-05521050190a does not exist
>> >>> >> > Nov 21 14:57:35 ceph01 bash[3941523]: debug
>> >>> 2023-11-21T14:57:35.950+0000
>> >>> >> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>> >>> >> > 48cbd97f-82f7-4b80-8086-890fff6e0824 does not exist
>> >>> >> >
>> >>> >> > I tried clearing them but they keep showing up. I am wondering if
>> >>> these
>> >>> >> > missing events can cause memory leaks over time.
>> >>> >> >
>> >>> >> > /Z
>> >>> >> >
>> >>> >> > On Wed, 22 Nov 2023 at 11:12, Eugen Block <eblock@xxxxxx> wrote:
>> >>> >> >
>> >>> >> >> Do you have the full stack trace? The pastebin only contains the
>> >>> >> >> "tcmalloc: large alloc" messages (same as in the tracker issue).
>> >>> Maybe
>> >>> >> >> comment in the tracker issue directly since Radek asked for
>> >>> someone
>> >>> >> >> with a similar problem in a newer release.
>> >>> >> >>
>> >>> >> >> Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx>:
>> >>> >> >>
>> >>> >> >> > Thanks, Eugen. It is similar in the sense that the mgr is
>> >>> getting
>> >>> >> >> > OOM-killed.
>> >>> >> >> >
>> >>> >> >> > It started happening in our cluster after the upgrade to
>> >>> 16.2.14.
>> >>> We
>> >>> >> >> > haven't had this issue with earlier Pacific releases.
>> >>> >> >> >
>> >>> >> >> > /Z
>> >>> >> >> >
>> >>> >> >> > On Tue, 21 Nov 2023, 21:53 Eugen Block, <eblock@xxxxxx>
>> wrote:
>> >>> >> >> >
>> >>> >> >> >> Just checking it on the phone, but isn’t this quite similar?
>> >>> >> >> >>
>> >>> >> >> >> https://tracker.ceph.com/issues/45136
>> >>> >> >> >>
>> >>> >> >> >> Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx>:
>> >>> >> >> >>
>> >>> >> >> >> > Hi,
>> >>> >> >> >> >
>> >>> >> >> >> > I'm facing a rather new issue with our Ceph cluster: from
>> >>> time
>> >>> to
>> >>> >> time
>> >>> >> >> >> > ceph-mgr on one of the two mgr nodes gets oom-killed after
>> >>> >> consuming
>> >>> >> >> over
>> >>> >> >> >> > 100 GB RAM:
>> >>> >> >> >> >
>> >>> >> >> >> > [Nov21 15:02] tp_osd_tp invoked oom-killer:
>> >>> >> >> >> > gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0,
>> >>> oom_score_adj=0
>> >>> >> >> >> > [  +0.000010] oom_kill_process.cold+0xb/0x10
>> >>> >> >> >> > [  +0.000002] [  pid  ]   uid tgid total_vm      rss
>> >>> >> pgtables_bytes
>> >>> >> >> >> > swapents oom_score_adj name
>> >>> >> >> >> > [  +0.000008]
>> >>> >> >> >> >
>> >>> >> >> >>
>> >>> >> >>
>> >>> >>
>> >>>
>> oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=504d37b566d9fd442d45904a00584b4f61c93c5d49dc59eb1c948b3d1c096907,mems_allowed=0-1,global_oom,task_memcg=/docker/3826be8f9115479117ddb8b721ca57585b2bdd58a27c7ed7b38e8d83eb795957,task=ceph-mgr,pid=3941610,uid=167
>>
>> >>>
>> >>> >> >> >> > [  +0.000697] Out of memory: Killed process 3941610
>> >>> (ceph-mgr)
>> >>> >> >> >> > total-vm:146986656kB, anon-rss:125340436kB, file-rss:0kB,
>> >>> >> >> shmem-rss:0kB,
>> >>> >> >> >> > UID:167 pgtables:260356kB oom_score_adj:0
>> >>> >> >> >> > [  +6.509769] oom_reaper: reaped process 3941610
>> >>> (ceph-mgr), now
>> >>> >> >> >> > anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
>> >>> >> >> >> >
>> >>> >> >> >> > The cluster is stable and operating normally, there's
>> >>> nothing
>> >>> >> unusual
>> >>> >> >> >> going
>> >>> >> >> >> > on before, during or after the kill, thus it's unclear what
>> >>> causes
>> >>> >> the
>> >>> >> >> >> mgr
>> >>> >> >> >> > to balloon, use all RAM and get killed. Systemd logs
>> >>> aren't very
>> >>> >> >> helpful:
>> >>> >> >> >> > they just show normal mgr operations until it fails to
>> >>> allocate
>> >>> >> memory
>> >>> >> >> >> and
>> >>> >> >> >> > gets killed: https://pastebin.com/MLyw9iVi
>> >>> >> >> >> >
>> >>> >> >> >> > The mgr experienced this issue several times in the last 2
>> >>> months,
>> >>> >> and
>> >>> >> >> >> the
>> >>> >> >> >> > events don't appear to correlate with any other events in
>> >>> the
>> >>> >> cluster
>> >>> >> >> >> > because basically nothing else happened at around those
>> >>> times.
>> >>> How
>> >>> >> >> can I
>> >>> >> >> >> > investigate this and figure out what's causing the mgr to
>> >>> consume
>> >>> >> all
>> >>> >> >> >> > memory and get killed?
>> >>> >> >> >> >
>> >>> >> >> >> > I would very much appreciate any advice!
>> >>> >> >> >> >
>> >>> >> >> >> > Best regards,
>> >>> >> >> >> > Zakhar
>> >>> >> >> >> > _______________________________________________
>> >>> >> >> >> > ceph-users mailing list -- ceph-users@xxxxxxx
>> >>> >> >> >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> >>> >> >> >>
>> >>> >> >> >>
>> >>> >> >> >> _______________________________________________
>> >>> >> >> >> ceph-users mailing list -- ceph-users@xxxxxxx
>> >>> >> >> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> >>> >> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>>
>> >>>
>> >>>
>> >>>
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users@xxxxxxx
>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx