Re: OSDs getting OOM-killed right after startup

David Orman <ormandj@xxxxxxxxxxxx> · Fri, 10 Jun 2022 09:40:05 -0500

Are you thinking it might be a permutation of:
https://tracker.ceph.com/issues/53729 ? There are some posts in it to check
for the issue, #53 and #65 had a few potential ways to check.

On Fri, Jun 10, 2022 at 5:32 AM Marius Leustean <marius.leus@xxxxxxxxx>
wrote:

> Did you check the mempools?
>
> ceph daemon osd.X dump_mempools
>
> This will tell you how much memory is consumed by different components of
> the OSD.
> Finger in the air, your ram might be consumed by the pg_log.
>
> If osd_pglog from the dump_mempools output is big, then you can lower the
> values of the related configuration options:
>
> osd_min_pg_log_entries = 100 (default is 250)
> osd_max_pg_log_entries = 500 (default is 10000)
> osd_target_pg_log_entries_per_osd = 30000 (default is 300000)
>
> Those are just examples. You can adjust it based on your current memory
> consumption and the available amount of RAM.
>
> On Fri, Jun 10, 2022 at 1:21 PM Eugen Block <eblock@xxxxxx> wrote:
>
> > Hi,
> >
> > could you share more cluster details and what your workload is?
> >
> > ceph -s
> > ceph osd df tree
> > ceph orch ls
> > ceph osd pools ls detail
> >
> > How big are your PGs?
> >
> > Zitat von Mara Sophie Grosch <littlefox@xxxxxxxxxx>:
> >
> > > Hi,
> > >
> > > good catch with the way too low memory target, I wanted to configure 1
> > > GiB not 1 MiB. I'm aware it's low, but removed anyway for testing - it
> > > sadly didn't change anything.
> > >
> > > I customize the config mostly for dealing problems I have, something in
> > > my setup makes the OSDs eat lots of memory in normal operation, just
> > > gradually increasing .. I'd send metrics if my monitoring was up again
> > > ^^' That maybe being some form of cache was the reason for that config
> > > line (it does not seem to be cache).
> > >
> > > I have found a way to deal with my cluster, using ceph-objectstore-tool
> > > to export-remove all PGs from the OSDs, getting them online and happy
> > > again and then importing a few PGs at a time in one OSD and let it
> > > backfill to the others.
> > >
> > > The problem of eating very much memory on startup manifests with some
> of
> > > the PGs only, but for those it goes up to ~50GiB. Problematic: when it
> > > has multiple of those PGs in its storage, it's handling those in
> > > parallel - needing even more memory of course, until it finally gets
> > > OOM-killed.
> > >
> > > So.. seems I can get my cluster running again, only limited by my
> > > internet upload now. Any hints why it eats a lot of memory in normal
> > > operation would still be appreciated.
> > >
> > > Best, Mara
> > >
> > >
> > > Am Wed, Jun 08, 2022 at 09:05:52AM +0000 schrieb Eugen Block:
> > >> It's even worse, you only give them 1MB, not GB.
> > >>
> > >> Zitat von Eugen Block <eblock@xxxxxx>:
> > >>
> > >>> Hi,
> > >>>
> > >>> is there any reason you use custom configs? Most of the defaults
> > >>> work well. But you only give your OSDs 1 GB of memory, that is way
> > >>> too low except for an idle cluster without much data. I recommend
> > >>> to remove the line
> > >>>
> > >>>   osd_memory_target = 1048576
> > >>>
> > >>> and let ceph handle it. I didn't install Quincy yet but in a
> > >>> healthy cluster the OSDs will take around 3 GB of memory, maybe 4,
> > >>> so you should be good with your setup.
> > >>>
> > >>> Regards,
> > >>> Eugen
> > >>>
> > >>> Zitat von Mara Sophie Grosch <littlefox@xxxxxxxxxx>:
> > >>>
> > >>>> Hi,
> > >>>>
> > >>>> I have a currently-down ceph cluster
> > >>>> * v17.2.0 / quay.io/v17.2.0-20220420
> > >>>> * 3 nodes, 4 OSDs
> > >>>> * around 1TiB used/3TiB total
> > >>>> * probably enough resources
> > >>>> - two of those nodes have 64GiB memory, the third has 16GiB
> > >>>> - one of the 64GiB nodes runs two OSDs, as it's a physical node with
> > >>>>  2 NVMe drives
> > >>>> * provisioned via Rook and running in my Kubernetes cluster
> > >>>>
> > >>>> After some upgrades yesterday (system packages on the nodes) and
> today
> > >>>> (Kubernetes to latest version), I wanted to reboot my nodes. The
> drain
> > >>>> of the first node put a lot of stress on the other OSDs, making them
> > go
> > >>>> OOM - but I think that probably is a bug already, as at least one of
> > >>>> those nodes has enough resources (64GiB memory, physical machine,
> > ~40GiB
> > >>>> surely free - but don't have metrics rn as everything is down).
> > >>>>
> > >>>> I'm now seeing all OSDs going into OOM right on startup, from what
> it
> > >>>> looks like everything is fine until right after `load_pgs` - as soon
> > as
> > >>>> it activates some PGs, memory usage increases _a lot_ (from ~4-5GiB
> > >>>> RES before to .. 60GiB, though that depends on the free memory on
> the
> > >>>> node).
> > >>>>
> > >>>> Because of this, I cannot get any of them online again and need
> advice
> > >>>> what to do and what info might be useful. Logs of one of those OSDs
> > are
> > >>>> here[1] (captured via kubectl logs, so something right from start
> > might
> > >>>> be missing - happy to dig deeper if you need more) and my changed
> > >>>> ceph.conf entries are here[2]. I had `bluefs_buffered_io = false`
> > until
> > >>>> today, changed it to true based on a suggestion in another debug
> > >>>> thread[3]
> > >>>>
> > >>>> Any hint is greatly appreciated, many thanks
> > >>>> Mara Grosch
> > >>>>
> > >>>> [1] https://pastebin.com/VFczNqUk
> > >>>> [2] https://pastebin.com/QXust5XD
> > >>>> [3]
> > >>>>
> >
> https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/CBPXLPWEVZLZE55WAQSMB7KSIQPV5I76/
> > >>
> > >>
> > >>
> >
> >
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx