Hi Alexander, now that you mention it, I indeed started disabling swap around that time. However, the effect is not immediate and it may or may not be related. In the plot you can see that there is no swap usage after the restart in October (I disabled swap on that occasion). However, the fast growth is still there. After the restart in November, the growth is initially as bad, but then suddenly slows down at 240G. After yet another restart in December, there is no growth any more and usage is stable around 180G. I had only a small fraction of total RAM configured as swap. It sounds rather odd that the pure availability of some swap causes a total and unlimited overrun of available RAM, all the way to OOM kills. As you can see from my plot, the total resident usage is now much smaller than usage at restart time. This must be a very weird piece of code that goes crazy if swap is present and stops being crazy after disabling swap and restarting a few times. I would consider this a serious bug if this is really the current behaviour of the memory allocator. Thanks for your hint and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Alexander Sporleder <asporleder@xxxxxxxxxx> Sent: 02 March 2022 13:55:26 To: Frank Schilder; Mark Nelson; Dan van der Ster; ceph-users Subject: Re: Re: OSD memory leak? Hello Frank, I had similar problems. https://www.mail-archive.com/ceph-users@xxxxxxx/msg11772.html I disabled Swap and now everything is fine. Best, Alex Am Dienstag, dem 01.03.2022 um 08:28 +0000 schrieb Frank Schilder: > Dear all, > > there is a new development on this old case. After expanding our cluster and adding more disks to each host, the > memory leak got really bad and I had to restart all OSDs every 3-4 weeks. Strangely enough, after the 3rd restart > following our latest RAM extension the problem disappeared. There was no change in configuration. It is as if the > problem just disappeared from one day to the other. The last restart I did the day before my Christmas holidays and > since then the RAM usage is stable. I might now even increase the memory limit a bit. > > A graph of the recorded memory usage for one of the servers is attached (its the same for all servers). In case the > attachment is removed, the image is available here: https://imgur.com/Ly17OTX. > > For completeness, the restart of all OSDs was not a reboot of the servers, it was > > 1) set noout > 2) for each host > 2.1) stop docker > 2.2) wait for peering > 2.3) start docker > 2.4) wait for peering > 3) unset noout > > All observations on latest mimic. > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx