Re: OSDs get killed by OOM when other host goes down

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Fri, 12 Nov 2021 17:02:50 +0100

Hi Marius,

Your mempools show quite high usage in osd_pglog and buffer_anon,
which reminds me of this issue:

https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/7IMIWCKIHXNULEBHVUIXQQGYUDJAO2SF/#Y2225AVEZYCBIJXXATJIJAXUWKNP4H3I

You can configure the pglog size to reduce the memory usage. But I
don't recall if anyone found the root cause why sometimes the pglog
size explodes.
(it didn't happen again for us)

-- Dan

On Fri, Nov 12, 2021 at 1:37 PM Marius Leustean <marius.leus@xxxxxxxxx> wrote:
>
> Hi Josh,
>
> There is 1 OSD per host.
> There are 3 pools of 256, 128 and 32 PGs (total = 416 PGs across 8 OSDs).
>
> ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus
> (stable)
>
> I still have 1 OSD where docker reports 61GB RAM being consumed by the
> container (we have containerized deployment).
> The dump_mempools is this one: https://paste2.org/yfng2saG (it reports 38GB)
>
> All OSDs are currently up+in and cluster is HEALTH_OK.
>
> Thanks!
>
> On Fri, Nov 12, 2021 at 2:22 PM Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx>
> wrote:
>
> > Hi Marius,
> >
> > > We have a 8 hosts cluster with 4TB NVMe drive per host for now. The pool
> > > size is 2 and it's hosting RBD images for VMs.
> > > Each host has 128GB RAM installed.
> >
> > How many OSDs/host? How many PGs/OSD? Which Ceph version?
> >
> > > What is really happening during recovery / backfills that takes this much
> > > amount of memory for 1 single OSD?
> >
> > It would be helpful to see what "ceph daemon osd.XXX dump_mempools"
> > command says for an OSD with high memory. One problem that has been
> > seen is the pglogs start consuming quite a bit of memory during
> > recovery scenarios (or even occasionally during steady state). This
> > issue has been alleviated a bit in Octopus+, where there's a limit on
> > the number of pglog entries per OSD, but there are still gaps.
> >
> > > Why is the OSD process taking ~100GB RAM and have 25min start time even
> > if
> > > the recovery process ended? (unless we wipe it and register it again).
> >
> > This sounds like a pileup of osdmaps. Depending on your Ceph version,
> > all OSDs may need to be up+in in order to trim osdmaps effectively.
> >
> > Josh
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx