Hi Cristoph, how fast do your OSDs run up to 4G with a 1G setting? You might be hit by up to 2 problems I'm facing on a mimic cluster: the OSD daemons have a memory leak and will slowly run over their limit. I run about 32 OSDs per host with 196G RAM and need to reboot every 3-4 weeks with a memory target of 2G. The other issue is that the OSD daemons ignore certain values from the config data base in a really bizarre way and you might need an OSD restart despite the doc saying its changeable at run time. I didn't have time to send a ticket to the tracker. What I found is that even though "ceph config show" and "ceph daemon osd.nnn config show" show that a memory limit of 2G should be active, this is only true if the value either comes from a per-OSD setting, or the global default. The values for hdds or device classes are ignored. This is somewhat difficult to observe unless you can restart OSDs with all sorts of combinations of values set and I didn't have time yet to submit a report with the data I collected. You should also check that the old-style cache setting is disabled. Coming from luminous, you might have non-default values in the ceph.conf, disabling the memory_target setting and falling back to old cache handling. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Mark Nelson <mnelson@xxxxxxxxxx> Sent: 05 May 2021 17:15:50 To: ceph-users@xxxxxxx Subject: Re: Out of Memory after Upgrading to Nautilus Hi Cristoph, 1GB per OSD is tough! the osd memory target only shrinks the size of the caches but can't control things like osd map size, pg log length, rocksdb wal buffers, etc. It's a "best effort" algorithm to try to fit the OSD mapped memory into that target but on it's own it doesn't really do well below 2GB/OSD (and even that can be tough when only adjusting the caches). That's one of the reasons the default is 4GB. To fit in 1GB you'll probably also need to reduce some of the previously mentioned things but there will be consequences (slower recovery, higher write amplification in rocksdb, etc). By default a bluestore OSD typically won't fit into a 1GB memory target and we don't regularly test configurations with that little memory per OSD. You might want to look at the memory pool performance counters, the priority cache performance counters, and the tcmalloc heap stats to help figure out where the memory is actually being used. Mark On 5/5/21 9:30 AM, Christoph Adomeit wrote: > I manage a historical cluster of severak ceph nodes with each 128 GB Ram and 36 OSD each 8 TB size. > > The cluster ist just for archive purpose and performance is not so important. > > The cluster was running fine for long time using ceph luminous. > > Last week I updated it to Debian 10 and Ceph Nautilus. > > Now I can see that the memory usage of each osd grows slowly to 4 GB each and once the system has > no memory left it will oom-kill processes > > I have already configured osd_memory_target = 1073741824 . > This helps for some hours but then memory usage will grow from 1 GB to 4 GB per OSD. > > Any ideas what I can do to further limit osd memory usage ? > > It would be good to keep the hardware running some more time without upgrading RAM on all > OSD machines. > > Any Ideas ? > > Thanks > Christoph > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx