Hi,
We have a luminous cluster which was upgraded from Hammer --> Jewel --> Luminous 12.2.8 recently. Post upgrade we are seeing issue with a few nodes where they are running out of memory and dying. In the logs we are seeing OOM killer. We don't have this issue before upgrade. The only difference is the nodes without any issue are R730xd and the ones with the memory leak are R740xd. The hardware vendor don't see anything wrong with the hardware. From Ceph end we are not seeing any issue when it comes to running the cluster, only issue is with memory leak. Right now we are actively rebooting the nodes in timely manner to avoid crashes. One R740xd node we set all the OSDs to 0.0 and there is no memory leak there. Any pointers to fix the issue would be helpful.
Thanks,
Pardhiv Karri
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com