Re: OSD memory usage after cephadm adoption

Mark Nelson <mark.nelson@xxxxxxxxx> · Tue, 11 Jul 2023 07:59:42 -0500

Hi Luis,

Can you do a "ceph tell osd.<num> perf dump" and "ceph daemon osd.<num> 
dump_mempools"?  Those should help us understand how much memory is 
being used by different parts of the OSD/bluestore and how much memory 
the priority cache thinks it has to work with.

Mark

On 7/11/23 4:57 AM, Luis Domingues wrote:
Hi everyone,

We recently migrate a cluster from ceph-ansible to cephadm. Everything went as expected.
But now we have some alerts on high memory usage. Cluster is running ceph 16.2.13.

Of course, after adoption OSDs ended up in the <unmanaged> zone:

NAME PORTS RUNNING REFRESHED AGE PLACEMENT
osd 88 7m ago - <unmanaged>

But the weirdest thing I observed, is that the OSDs seem to use more memory that the mem limit:

NAME HOST PORTS STATUS REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID CONTAINER ID
osd.0 <node> running (5d) 2m ago 5d 19.7G 6400M 16.2.13 327f301eff51 ca07fe74a0fa
osd.1 <node> running (5d) 2m ago 5d 7068M 6400M 16.2.13 327f301eff51 6223ed8e34e9
osd.10 <node> running (5d) 10m ago 5d 7235M 6400M 16.2.13 327f301eff51 073ddc0d7391 osd.100 <node> running (5d) 2m ago 5d 7118M 6400M 16.2.13 327f301eff51 b7f9238c0c24

Does anybody knows why OSDs would use more memory than the limit?

Thanks

Luis Domingues
Proton AG
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
Best Regards,
Mark Nelson
Head of R&D (USA)

Clyso GmbH
p: +49 89 21552391 12
a: Loristraße 8 | 80335 München | Germany
w: https://clyso.com | e: mark.nelson@xxxxxxxxx

We are hiring: https://www.clyso.com/jobs/
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx