On Tue, Oct 15, 2024, 1:38 PM Anthony D'Atri <aad@xxxxxxxxxxxxxx> wrote: > > > > On Oct 15, 2024, at 1:06 PM, Dave Hall <kdhall@xxxxxxxxxxxxxx> wrote: > > > > Hello. > > > > I'm seeing the following in the Dashboard -> Configuration panel > > for osd_memory_target: > > > > Default: > > 4294967296 > > > > Current Values: > > osd: 9797659437, > > osd: 10408081664, > > osd: 11381160192, > > osd: 22260320563 > > > > I have 4 hoists in the cluster right now - all OSD+MGR+MON. 3 have 128GB > > RAM, the 4th has 256GB. > > > https://docs.ceph.com/en/reef/cephadm/services/osd/#automatically-tuning-osd-memory > > You have autotuning enabled, and it’s trying to use all of your physmem. > I don’t know offhand how Ceph determines the amount of available memory, if > it looks specifically for physmem or if it only looks at vmem. If it looks > at vmem that arguably could be a bug > > > > On the host with 256GB, top shows some OSD > > processes with very high VIRT and RES values - the highest VIRT OSD has > > 13.0g. The highest RES is 8.5g. > > > > All 4 systems are currently swapping, but the 256GB system has much > higher > > swap usage. > > > > I am confused why I have 4 current values for osd_memory_target, and > > especially about the 4th one at 22GB. > > > > Also, I'm recalling that there might be a recommendation to disable swap. > > and I could easily do 'swapoff -a' when the swap usage is lower than the > > free RAM. > > I tend to advise not using swap at all. Suggest disabling swap in fstab, > then serially rebooting your OSD nodes, of course waiting for recovery > between each before proceeding to the next. > > > > > Can anybody shed any light on this? > > > > Thanks. > > > > -Dave > > > > -- > > Dave Hall > > Binghamton University > > kdhall@xxxxxxxxxxxxxx > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx The swap recommendation is a contentious one - I, for one, have always been against it. IMHO, disabling swap is a recommendation that comes up because folks are afraid of their OSDs becoming sluggish when their hosts become oversubscribed. But why not just avoid oversubscription altogether? If you set appropriate OSD memory targets, set kernel swapiness to something like 10-20, and properly pin your OSDs in a system with >1 NUMA node so that they're evenly distributed across NUMA nodes, your kernel will not swap because it simply has no reason to. Because we leave swap enabled, we actually found that we were giving up tons of performance -- after digging in when we saw swapping in some cases previously, we found that the NUMA page balancer in the kernel was shuffling pages around constantly before we had NUMA pinned the OSD processes. If we had just disabled swap, the OSDs would have still become sluggish and identifying why would have been a lot harder, because its not enough for performance to tank... just start dropping off somewhat when pages started dancing between nodes. Ever since we NUMA pinned our OSDs and set OSD memory targets appropriately, not a byte has been swapped to disk in over a year across a huge farm of OSDs (and they got noticably faster, too). Tyler _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx