> On Mar 19, 2025, at 4:44 AM, Frédéric Nass <frederic.nass@xxxxxxxxxxxxxxxx> wrote: > > > Hi Brian, > > TL;DR: bluefs_buffered_io = true and SWAP-enabled OSD nodes do not work well together. > > Please review these two PRs [1] and [2] to understand the rationale behind bluefs_buffered_io and why its default value has changed over time and Ceph releases (from true to false, then back to true again). > The reason for changing it to false in the past was due to an observed situation where bluefs_buffered_io = true led to excessive SWAP usage. > > As Josh mentioned, whether 'true' or 'false' is better for you depends on your workload, how your cluster is built (collocated OSDs or not), and whether SWAP is enabled on your OSD nodes. > > For example, our cluster is used for many different workloads, some of which use OMAP extensively. It consists of non-collocated OSDs using SSDs/NVMes for RocksDB. We decided to set bluefs_buffered_io back to true (when it defaulted to false) and disable SWAP on all nodes because we were experiencing slow requests during snap trimming with bluefs_buffered_io = false. > > What I would recommend you try is to disable SWAP on all nodes (swap was good in the 80's :-)) ! See below. > and leave bluefs_buffered_io enabled. > > Regards, > Frédéric > > [1] https://github.com/ceph/ceph/pull/34224 > [2] https://github.com/ceph/ceph/pull/38044 > > > ----- Le 19 Mar 25, à 1:11, Brian Marcotte marcotte@xxxxxxxxx a écrit : > >>> The setting you're looking for is bluefs_buffered_io. This is very >>> much a YMMV setting, so it's best to test with both modes, but I >>> usually recommend turning it off for all but omap-intensive workloads >>> (e.g. RGW index) ... >> >> We're not using RGW, only RBD. >> >> Currently I find it hard to prevent Linux from swapping at least a little >> no matter what vm settings I use. The only way to win is not to play. Swap is an anachronism from the days of 3MB diskless workstations. Yes, 3MB, like the Sun 2/50. Swapping to an ND partition on a Fuji 2351 Eagle. Swap can’t be used if it isn’t provisioned, and if it’s provisioned, it should be disabled and that partition space merged with an adjacent filesystem. If swap is there because of inadequate physmem, DIMMs are relatively affordable. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx