Re: OSDs get killed by OOM when other host goes down

Marius Leustean <marius.leus@xxxxxxxxx> · Fri, 12 Nov 2021 14:37:09 +0200

Hi Josh,

There is 1 OSD per host.
There are 3 pools of 256, 128 and 32 PGs (total = 416 PGs across 8 OSDs).

ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus
(stable)

I still have 1 OSD where docker reports 61GB RAM being consumed by the
container (we have containerized deployment).
The dump_mempools is this one: https://paste2.org/yfng2saG (it reports 38GB)

All OSDs are currently up+in and cluster is HEALTH_OK.

Thanks!

On Fri, Nov 12, 2021 at 2:22 PM Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx>
wrote:

> Hi Marius,
>
> > We have a 8 hosts cluster with 4TB NVMe drive per host for now. The pool
> > size is 2 and it's hosting RBD images for VMs.
> > Each host has 128GB RAM installed.
>
> How many OSDs/host? How many PGs/OSD? Which Ceph version?
>
> > What is really happening during recovery / backfills that takes this much
> > amount of memory for 1 single OSD?
>
> It would be helpful to see what "ceph daemon osd.XXX dump_mempools"
> command says for an OSD with high memory. One problem that has been
> seen is the pglogs start consuming quite a bit of memory during
> recovery scenarios (or even occasionally during steady state). This
> issue has been alleviated a bit in Octopus+, where there's a limit on
> the number of pglog entries per OSD, but there are still gaps.
>
> > Why is the OSD process taking ~100GB RAM and have 25min start time even
> if
> > the recovery process ended? (unless we wipe it and register it again).
>
> This sounds like a pileup of osdmaps. Depending on your Ceph version,
> all OSDs may need to be up+in in order to trim osdmaps effectively.
>
> Josh
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx