Re: OSDs get killed by OOM when other host goes down

Dan van der Ster <dan@xxxxxxxxxxxxxx> · Fri, 12 Nov 2021 17:05:24 +0100



Also similar: https://tracker.ceph.com/issues/51609

On Fri, Nov 12, 2021 at 5:02 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>
> Hi Marius,
>
> Your mempools show quite high usage in osd_pglog and buffer_anon,
> which reminds me of this issue:
>
> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/7IMIWCKIHXNULEBHVUIXQQGYUDJAO2SF/#Y2225AVEZYCBIJXXATJIJAXUWKNP4H3I
>
> You can configure the pglog size to reduce the memory usage. But I
> don't recall if anyone found the root cause why sometimes the pglog
> size explodes.
> (it didn't happen again for us)
>
> -- Dan
>
> On Fri, Nov 12, 2021 at 1:37 PM Marius Leustean <marius.leus@xxxxxxxxx> wrote:
> >
> > Hi Josh,
> >
> > There is 1 OSD per host.
> > There are 3 pools of 256, 128 and 32 PGs (total = 416 PGs across 8 OSDs).
> >
> > ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus
> > (stable)
> >
> > I still have 1 OSD where docker reports 61GB RAM being consumed by the
> > container (we have containerized deployment).
> > The dump_mempools is this one: https://paste2.org/yfng2saG (it reports 38GB)
> >
> > All OSDs are currently up+in and cluster is HEALTH_OK.
> >
> > Thanks!
> >
> > On Fri, Nov 12, 2021 at 2:22 PM Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx>
> > wrote:
> >
> > > Hi Marius,
> > >
> > > > We have a 8 hosts cluster with 4TB NVMe drive per host for now. The pool
> > > > size is 2 and it's hosting RBD images for VMs.
> > > > Each host has 128GB RAM installed.
> > >
> > > How many OSDs/host? How many PGs/OSD? Which Ceph version?
> > >
> > > > What is really happening during recovery / backfills that takes this much
> > > > amount of memory for 1 single OSD?
> > >
> > > It would be helpful to see what "ceph daemon osd.XXX dump_mempools"
> > > command says for an OSD with high memory. One problem that has been
> > > seen is the pglogs start consuming quite a bit of memory during
> > > recovery scenarios (or even occasionally during steady state). This
> > > issue has been alleviated a bit in Octopus+, where there's a limit on
> > > the number of pglog entries per OSD, but there are still gaps.
> > >
> > > > Why is the OSD process taking ~100GB RAM and have 25min start time even
> > > if
> > > > the recovery process ended? (unless we wipe it and register it again).
> > >
> > > This sounds like a pileup of osdmaps. Depending on your Ceph version,
> > > all OSDs may need to be up+in in order to trim osdmaps effectively.
> > >
> > > Josh
> > >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx