Re: Ceph 16.2.14: ceph-mgr getting oom-killed

Adrien Georget <adrien.georget@xxxxxxxxxxx> · Wed, 22 Nov 2023 14:22:55 +0100

Hi,

This memory leak with ceph-mgr seems to be due to a change in Ceph 16.2.12.
Check this issue : https://tracker.ceph.com/issues/59580
We are also affected by this, with or without containerized services.

Cheers,
Adrien

Le 22/11/2023 à 14:14, Eugen Block a écrit :
One other difference is you use docker, right? We use podman, could it 
be some docker restriction?

Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx>:

It's a 6-node cluster with 96 OSDs, not much I/O, mgr . Each node has 
384
GB of RAM, each OSD has a memory target of 16 GB, about 100 GB of 
memory,
give or take, is available (mostly used by page cache) on each node 
during
normal operation. Nothing unusual there, tbh.

No unusual mgr modules or settings either, except for disabled progress:

{
    "always_on_modules": [
        "balancer",
        "crash",
        "devicehealth",
        "orchestrator",
        "pg_autoscaler",
        "progress",
        "rbd_support",
        "status",
        "telemetry",
        "volumes"
    ],
    "enabled_modules": [
        "cephadm",
        "dashboard",
        "iostat",
        "prometheus",
        "restful"
    ],

/Z

On Wed, 22 Nov 2023, 14:52 Eugen Block, <eblock@xxxxxx> wrote:

What does your hardware look like memory-wise? Just for comparison,
one customer cluster has 4,5 GB in use (middle-sized cluster for
openstack, 280 OSDs):

     PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+
COMMAND
    6077 ceph      20   0 6357560 4,522g  22316 S 12,00 1,797
57022:54 ceph-mgr

In our own cluster (smaller than that and not really heavily used) the
mgr uses almost 2 GB. So those numbers you have seem relatively small.

Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx>:

> I've disabled the progress module entirely and will see how it goes.
> Otherwise, mgr memory usage keeps increasing slowly, from past 
experience
> it will stabilize at around 1.5-1.6 GB. Other than this event 
warning,
it's
> unclear what could have caused random memory ballooning.
>
> /Z
>
> On Wed, 22 Nov 2023 at 13:07, Eugen Block <eblock@xxxxxx> wrote:
>
>> I see these progress messages all the time, I don't think they cause
>> it, but I might be wrong. You can disable it just to rule that out.
>>
>> Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx>:
>>
>> > Unfortunately, I don't have a full stack trace because there's no
crash
>> > when the mgr gets oom-killed. There's just the mgr log, which 
looks
>> > completely normal until about 2-3 minutes before the oom-kill, 
when
>> > tmalloc warnings show up.
>> >
>> > I'm not sure that it's the same issue that is described in the
tracker.
>> We
>> > seem to have some stale "events" in the progress module though:
>> >
>> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
2023-11-21T14:56:30.718+0000
>> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>> > cacc4230-75ee-4892-b8fd-a19fec8f9f66 does not exist
>> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
2023-11-21T14:56:30.718+0000
>> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>> > 44824331-3f6b-45c4-b925-423d098c3c76 does not exist
>> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
2023-11-21T14:56:30.718+0000
>> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>> > 0139bc54-ae42-4483-b278-851d77f23f9f does not exist
>> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
2023-11-21T14:56:30.718+0000
>> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>> > f9d6c20e-b8d8-4625-b9cf-84da1244c822 does not exist
>> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
2023-11-21T14:56:30.718+0000
>> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>> > 1486b26d-2a23-4416-a864-2cbb0ecf1429 does not exist
>> > Nov 21 14:56:30 ceph01 bash[3941523]: debug
2023-11-21T14:56:30.718+0000
>> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>> > 7f14d01c-498c-413f-b2ef-05521050190a does not exist
>> > Nov 21 14:57:35 ceph01 bash[3941523]: debug
2023-11-21T14:57:35.950+0000
>> > 7f4bb19ef700  0 [progress WARNING root] complete: ev
>> > 48cbd97f-82f7-4b80-8086-890fff6e0824 does not exist
>> >
>> > I tried clearing them but they keep showing up. I am wondering if
these
>> > missing events can cause memory leaks over time.
>> >
>> > /Z
>> >
>> > On Wed, 22 Nov 2023 at 11:12, Eugen Block <eblock@xxxxxx> wrote:
>> >
>> >> Do you have the full stack trace? The pastebin only contains the
>> >> "tcmalloc: large alloc" messages (same as in the tracker issue).
Maybe
>> >> comment in the tracker issue directly since Radek asked for 
someone
>> >> with a similar problem in a newer release.
>> >>
>> >> Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx>:
>> >>
>> >> > Thanks, Eugen. It is similar in the sense that the mgr is 
getting
>> >> > OOM-killed.
>> >> >
>> >> > It started happening in our cluster after the upgrade to 
16.2.14.
We
>> >> > haven't had this issue with earlier Pacific releases.
>> >> >
>> >> > /Z
>> >> >
>> >> > On Tue, 21 Nov 2023, 21:53 Eugen Block, <eblock@xxxxxx> wrote:
>> >> >
>> >> >> Just checking it on the phone, but isn’t this quite similar?
>> >> >>
>> >> >> https://tracker.ceph.com/issues/45136
>> >> >>
>> >> >> Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx>:
>> >> >>
>> >> >> > Hi,
>> >> >> >
>> >> >> > I'm facing a rather new issue with our Ceph cluster: from 
time
to
>> time
>> >> >> > ceph-mgr on one of the two mgr nodes gets oom-killed after
>> consuming
>> >> over
>> >> >> > 100 GB RAM:
>> >> >> >
>> >> >> > [Nov21 15:02] tp_osd_tp invoked oom-killer:
>> >> >> > gfp_mask=0x100cca(GFP_HIGHUSER_MOVABLE), order=0,
oom_score_adj=0
>> >> >> > [  +0.000010] oom_kill_process.cold+0xb/0x10
>> >> >> > [  +0.000002] [  pid  ]   uid tgid total_vm      rss
>> pgtables_bytes
>> >> >> > swapents oom_score_adj name
>> >> >> > [  +0.000008]
>> >> >> >
>> >> >>
>> >>
>>
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=504d37b566d9fd442d45904a00584b4f61c93c5d49dc59eb1c948b3d1c096907,mems_allowed=0-1,global_oom,task_memcg=/docker/3826be8f9115479117ddb8b721ca57585b2bdd58a27c7ed7b38e8d83eb795957,task=ceph-mgr,pid=3941610,uid=167 

>> >> >> > [  +0.000697] Out of memory: Killed process 3941610 
(ceph-mgr)
>> >> >> > total-vm:146986656kB, anon-rss:125340436kB, file-rss:0kB,
>> >> shmem-rss:0kB,
>> >> >> > UID:167 pgtables:260356kB oom_score_adj:0
>> >> >> > [  +6.509769] oom_reaper: reaped process 3941610 
(ceph-mgr), now
>> >> >> > anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
>> >> >> >
>> >> >> > The cluster is stable and operating normally, there's 
nothing
>> unusual
>> >> >> going
>> >> >> > on before, during or after the kill, thus it's unclear what
causes
>> the
>> >> >> mgr
>> >> >> > to balloon, use all RAM and get killed. Systemd logs 
aren't very
>> >> helpful:
>> >> >> > they just show normal mgr operations until it fails to 
allocate
>> memory
>> >> >> and
>> >> >> > gets killed: https://pastebin.com/MLyw9iVi
>> >> >> >
>> >> >> > The mgr experienced this issue several times in the last 2
months,
>> and
>> >> >> the
>> >> >> > events don't appear to correlate with any other events in 
the
>> cluster
>> >> >> > because basically nothing else happened at around those 
times.
How
>> >> can I
>> >> >> > investigate this and figure out what's causing the mgr to
consume
>> all
>> >> >> > memory and get killed?
>> >> >> >
>> >> >> > I would very much appreciate any advice!
>> >> >> >
>> >> >> > Best regards,
>> >> >> > Zakhar
>> >> >> > _______________________________________________
>> >> >> > ceph-users mailing list -- ceph-users@xxxxxxx
>> >> >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> >> >>
>> >> >>
>> >> >> _______________________________________________
>> >> >> ceph-users mailing list -- ceph-users@xxxxxxx
>> >> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> >> >>
>> >>
>> >>
>> >>
>> >>
>>
>>
>>
>>

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx