Dear Frederic, Thanks a lot for the suggestions. We are using the valilla Linux 4.19 LTS version. Do you think we may be suffering from the same bug? best regards, Samuel huxiaoyu@xxxxxxxxxxxx From: Frédéric Nass Date: 2024-01-12 09:19 To: huxiaoyu CC: ceph-users Subject: Re: Ceph Nautilous 14.2.22 slow OSD memory leak? Hello, We've had a similar situation recently where OSDs would use way more memory than osd_memory_target and get OOM killed by the kernel. It was due to a kernel bug related to cgroups [1]. If num_cgroups below keeps increasing then you may hit this bug. $ cat /proc/cgroups | grep -e subsys -e blkio | column -t #subsys_name hierarchy num_cgroups enabled blkio 4 1099 1 If you hit this bug, upgrading OSDs nodes kernels should get you through. If you can't access the Red Hat KB [1], let me know your current nodes kernel version and I'll check for you. Regards, Frédéric. [1] https://access.redhat.com/solutions/7014337 De: huxiaoyu <huxiaoyu@xxxxxxxxxxxx> à: ceph-users <ceph-users@xxxxxxx> Envoyé: mercredi 10 janvier 2024 19:21 CET Sujet : Ceph Nautilous 14.2.22 slow OSD memory leak? Dear Ceph folks, I am responsible for two Ceph clusters, running Nautilius 14.2.22 version, one with replication 3, and the other with EC 4+2. After around 400 days runing quietly and smoothly, recently the two clusters occured with similar problems: some of OSDs consume ca 18 GB while the memory target is setting at 2GB. What could wrong in the background? Does it mean any slow OSD memory leak issues with 14.2.22 which i do not know yet? I would be highly appreciated if some some provides any clues, ideas, comments ...... best regards, Samuel huxiaoyu@xxxxxxxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx