Re: OSDs getting OOM-killed right after startup

Marius Leustean <marius.leus@xxxxxxxxx> · Fri, 10 Jun 2022 13:32:05 +0300

Did you check the mempools?

ceph daemon osd.X dump_mempools

This will tell you how much memory is consumed by different components of
the OSD.
Finger in the air, your ram might be consumed by the pg_log.

If osd_pglog from the dump_mempools output is big, then you can lower the
values of the related configuration options:

osd_min_pg_log_entries = 100 (default is 250)
osd_max_pg_log_entries = 500 (default is 10000)
osd_target_pg_log_entries_per_osd = 30000 (default is 300000)

Those are just examples. You can adjust it based on your current memory
consumption and the available amount of RAM.

On Fri, Jun 10, 2022 at 1:21 PM Eugen Block <eblock@xxxxxx> wrote:

> Hi,
>
> could you share more cluster details and what your workload is?
>
> ceph -s
> ceph osd df tree
> ceph orch ls
> ceph osd pools ls detail
>
> How big are your PGs?
>
> Zitat von Mara Sophie Grosch <littlefox@xxxxxxxxxx>:
>
> > Hi,
> >
> > good catch with the way too low memory target, I wanted to configure 1
> > GiB not 1 MiB. I'm aware it's low, but removed anyway for testing - it
> > sadly didn't change anything.
> >
> > I customize the config mostly for dealing problems I have, something in
> > my setup makes the OSDs eat lots of memory in normal operation, just
> > gradually increasing .. I'd send metrics if my monitoring was up again
> > ^^' That maybe being some form of cache was the reason for that config
> > line (it does not seem to be cache).
> >
> > I have found a way to deal with my cluster, using ceph-objectstore-tool
> > to export-remove all PGs from the OSDs, getting them online and happy
> > again and then importing a few PGs at a time in one OSD and let it
> > backfill to the others.
> >
> > The problem of eating very much memory on startup manifests with some of
> > the PGs only, but for those it goes up to ~50GiB. Problematic: when it
> > has multiple of those PGs in its storage, it's handling those in
> > parallel - needing even more memory of course, until it finally gets
> > OOM-killed.
> >
> > So.. seems I can get my cluster running again, only limited by my
> > internet upload now. Any hints why it eats a lot of memory in normal
> > operation would still be appreciated.
> >
> > Best, Mara
> >
> >
> > Am Wed, Jun 08, 2022 at 09:05:52AM +0000 schrieb Eugen Block:
> >> It's even worse, you only give them 1MB, not GB.
> >>
> >> Zitat von Eugen Block <eblock@xxxxxx>:
> >>
> >>> Hi,
> >>>
> >>> is there any reason you use custom configs? Most of the defaults
> >>> work well. But you only give your OSDs 1 GB of memory, that is way
> >>> too low except for an idle cluster without much data. I recommend
> >>> to remove the line
> >>>
> >>>   osd_memory_target = 1048576
> >>>
> >>> and let ceph handle it. I didn't install Quincy yet but in a
> >>> healthy cluster the OSDs will take around 3 GB of memory, maybe 4,
> >>> so you should be good with your setup.
> >>>
> >>> Regards,
> >>> Eugen
> >>>
> >>> Zitat von Mara Sophie Grosch <littlefox@xxxxxxxxxx>:
> >>>
> >>>> Hi,
> >>>>
> >>>> I have a currently-down ceph cluster
> >>>> * v17.2.0 / quay.io/v17.2.0-20220420
> >>>> * 3 nodes, 4 OSDs
> >>>> * around 1TiB used/3TiB total
> >>>> * probably enough resources
> >>>> - two of those nodes have 64GiB memory, the third has 16GiB
> >>>> - one of the 64GiB nodes runs two OSDs, as it's a physical node with
> >>>>  2 NVMe drives
> >>>> * provisioned via Rook and running in my Kubernetes cluster
> >>>>
> >>>> After some upgrades yesterday (system packages on the nodes) and today
> >>>> (Kubernetes to latest version), I wanted to reboot my nodes. The drain
> >>>> of the first node put a lot of stress on the other OSDs, making them
> go
> >>>> OOM - but I think that probably is a bug already, as at least one of
> >>>> those nodes has enough resources (64GiB memory, physical machine,
> ~40GiB
> >>>> surely free - but don't have metrics rn as everything is down).
> >>>>
> >>>> I'm now seeing all OSDs going into OOM right on startup, from what it
> >>>> looks like everything is fine until right after `load_pgs` - as soon
> as
> >>>> it activates some PGs, memory usage increases _a lot_ (from ~4-5GiB
> >>>> RES before to .. 60GiB, though that depends on the free memory on the
> >>>> node).
> >>>>
> >>>> Because of this, I cannot get any of them online again and need advice
> >>>> what to do and what info might be useful. Logs of one of those OSDs
> are
> >>>> here[1] (captured via kubectl logs, so something right from start
> might
> >>>> be missing - happy to dig deeper if you need more) and my changed
> >>>> ceph.conf entries are here[2]. I had `bluefs_buffered_io = false`
> until
> >>>> today, changed it to true based on a suggestion in another debug
> >>>> thread[3]
> >>>>
> >>>> Any hint is greatly appreciated, many thanks
> >>>> Mara Grosch
> >>>>
> >>>> [1] https://pastebin.com/VFczNqUk
> >>>> [2] https://pastebin.com/QXust5XD
> >>>> [3]
> >>>>
> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/CBPXLPWEVZLZE55WAQSMB7KSIQPV5I76/
> >>
> >>
> >>
>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx