Re: OSDs get killed by OOM when other host goes down

Marius Leustean <marius.leus@xxxxxxxxx> · Mon, 15 Nov 2021 22:39:24 +0200

I upgraded all the OSDs + mons to Pacific 16.2.6
All PGs have been active+clean for the last days, but memory is still quite
high:

            "osd_pglog": {

                "items": 35066835,

                "bytes": 3663079160 (3.6 GB)

            },

            "buffer_anon": {

                "items": 346531,

                "bytes": 83548293 (0.08 GB)

            },

        "total": {

            "items": 123515722,

            "bytes": 7888791573 (7.88 GB)

        }

However, docker stats reports 38GB for that container.

There is a huge gap between what RAM is being used by the container
what ceph daemon osd.xxx dump_mempools reports.

How can I check if trim happens?

How can I check what else is consuming memory in the ceph-osd process?

On Mon, Nov 15, 2021 at 3:50 PM Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx>
wrote:

> Hi Istvan,
>
> > So this means if we are doing some operation which involved recovery, we
> should not do another one until this trimming not done yet? Let's say I've
> added new host with full of drives, once the rebalance finished, we should
> leave the cluster to trim osdmap before I add another host?
>
> Ah, no, sorry if I gave the wrong impression. If you have Nautilus
> 14.2.12+, Octopus 15.2.5+, or Pacific, then, as long as you don't have
> any down+in OSDs, osdmaps should be trimmed.
>
> Josh
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx