Re: OSDs getting OOM-killed right after startup

Mara Sophie Grosch <littlefox@xxxxxxxxxx> · Thu, 9 Jun 2022 22:30:49 +0200

Hi,

good catch with the way too low memory target, I wanted to configure 1
GiB not 1 MiB. I'm aware it's low, but removed anyway for testing - it
sadly didn't change anything.

I customize the config mostly for dealing problems I have, something in
my setup makes the OSDs eat lots of memory in normal operation, just
gradually increasing .. I'd send metrics if my monitoring was up again
^^' That maybe being some form of cache was the reason for that config
line (it does not seem to be cache).

I have found a way to deal with my cluster, using ceph-objectstore-tool
to export-remove all PGs from the OSDs, getting them online and happy
again and then importing a few PGs at a time in one OSD and let it
backfill to the others.

The problem of eating very much memory on startup manifests with some of
the PGs only, but for those it goes up to ~50GiB. Problematic: when it
has multiple of those PGs in its storage, it's handling those in
parallel - needing even more memory of course, until it finally gets
OOM-killed.

So.. seems I can get my cluster running again, only limited by my
internet upload now. Any hints why it eats a lot of memory in normal
operation would still be appreciated.

Best, Mara

Am Wed, Jun 08, 2022 at 09:05:52AM +0000 schrieb Eugen Block:
It's even worse, you only give them 1MB, not GB.

Zitat von Eugen Block <eblock@xxxxxx>:

Hi,

is there any reason you use custom configs? Most of the defaults 
work well. But you only give your OSDs 1 GB of memory, that is way 
too low except for an idle cluster without much data. I recommend to 
remove the line

   osd_memory_target = 1048576

and let ceph handle it. I didn't install Quincy yet but in a healthy 
cluster the OSDs will take around 3 GB of memory, maybe 4, so you 
should be good with your setup.

Regards,
Eugen

Zitat von Mara Sophie Grosch <littlefox@xxxxxxxxxx>:

Hi,

I have a currently-down ceph cluster
* v17.2.0 / quay.io/v17.2.0-20220420
* 3 nodes, 4 OSDs
* around 1TiB used/3TiB total
* probably enough resources
- two of those nodes have 64GiB memory, the third has 16GiB
- one of the 64GiB nodes runs two OSDs, as it's a physical node with
  2 NVMe drives
* provisioned via Rook and running in my Kubernetes cluster

After some upgrades yesterday (system packages on the nodes) and today
(Kubernetes to latest version), I wanted to reboot my nodes. The drain
of the first node put a lot of stress on the other OSDs, making them go
OOM - but I think that probably is a bug already, as at least one of
those nodes has enough resources (64GiB memory, physical machine, ~40GiB
surely free - but don't have metrics rn as everything is down).

I'm now seeing all OSDs going into OOM right on startup, from what it
looks like everything is fine until right after `load_pgs` - as soon as
it activates some PGs, memory usage increases _a lot_ (from ~4-5GiB
RES before to .. 60GiB, though that depends on the free memory on the
node).

Because of this, I cannot get any of them online again and need advice
what to do and what info might be useful. Logs of one of those OSDs are
here[1] (captured via kubectl logs, so something right from start might
be missing - happy to dig deeper if you need more) and my changed
ceph.conf entries are here[2]. I had `bluefs_buffered_io = false` until
today, changed it to true based on a suggestion in another debug
thread[3]

Any hint is greatly appreciated, many thanks
Mara Grosch

[1] https://pastebin.com/VFczNqUk
[2] https://pastebin.com/QXust5XD
[3] https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/message/CBPXLPWEVZLZE55WAQSMB7KSIQPV5I76/

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx