With the help of community members, I managed to enable RocksDB compression for a test monitor, and it seems to be working well. Monitor w/o compression writes about 750 MB to disk in 5 minutes: 4854 be/4 167 4.97 M 755.02 M 0.00 % 0.24 % ceph-mon -n mon.ceph04 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-stderr=true [rocksdb:low0] Monitor with LZ4 compression writes about 1/4 of that over the same time period: 2034728 be/4 167 172.00 K 199.27 M 0.00 % 0.06 % ceph-mon -n mon.ceph05 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-stderr=true --default-log-stderr-prefix=debug --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-stderr=true [rocksdb:low0] This is caused by the apparent difference in store.db sizes. Mon store.db w/o compression: # ls -al /var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph04/store.db total 257196 drwxr-xr-x 2 167 167 4096 Oct 16 14:00 . drwx------ 3 167 167 4096 Aug 31 05:22 .. -rw-r--r-- 1 167 167 1517623 Oct 16 14:00 3073035.log -rw-r--r-- 1 167 167 67285944 Oct 16 14:00 3073037.sst -rw-r--r-- 1 167 167 67402325 Oct 16 14:00 3073038.sst -rw-r--r-- 1 167 167 62364991 Oct 16 14:00 3073039.sst Mon store.db with compression: # ls -al /var/lib/ceph/3f50555a-ae2a-11eb-a2fc-ffde44714d86/mon.ceph05/store.db total 91188 drwxr-xr-x 2 167 167 4096 Oct 16 14:00 . drwx------ 3 167 167 4096 Oct 16 13:35 .. -rw-r--r-- 1 167 167 1760114 Oct 16 14:00 012693.log -rw-r--r-- 1 167 167 52236087 Oct 16 14:00 012695.sst There are no apparent downsides thus far. If everything works well, I will try adding compression to other monitors. /Z On Mon, 16 Oct 2023 at 14:57, Zakhar Kirpichenko <zakhar@xxxxxxxxx> wrote: > The issue persists, although to a lesser extent. Any comments from the > Ceph team please? > > /Z > > On Fri, 13 Oct 2023 at 20:51, Zakhar Kirpichenko <zakhar@xxxxxxxxx> wrote: > >> > Some of it is transferable to RocksDB on mons nonetheless. >> >> Please point me to relevant Ceph documentation, i.e. a description of how >> various Ceph monitor and RocksDB tunables affect the operations of >> monitors, I'll gladly look into it. >> >> > Please point me to such recommendations, if they're on docs.ceph.com I'll >> get them updated. >> >> This are the recommendations we used when we built our Pacific cluster: >> https://docs.ceph.com/en/pacific/start/hardware-recommendations/ >> >> Our drives are 4x times larger than recommended by this guide. The drives >> are rated for < 0.5 DWPD, which is more than sufficient for boot drives and >> storage of rarely modified files. It is not documented or suggested >> anywhere that monitor processes write several hundred gigabytes of data per >> day, exceeding the amount of data written by OSDs. Which is why I am not >> convinced that what we're observing is expected behavior, but it's not easy >> to get a definitive answer from the Ceph community. >> >> /Z >> >> On Fri, 13 Oct 2023 at 20:35, Anthony D'Atri <anthony.datri@xxxxxxxxx> >> wrote: >> >>> Some of it is transferable to RocksDB on mons nonetheless. >>> >>> but their specs exceed Ceph hardware recommendations by a good margin >>> >>> >>> Please point me to such recommendations, if they're on docs.ceph.com I'll >>> get them updated. >>> >>> On Oct 13, 2023, at 13:34, Zakhar Kirpichenko <zakhar@xxxxxxxxx> wrote: >>> >>> Thank you, Anthony. As I explained to you earlier, the article you had >>> sent is about RocksDB tuning for Bluestore OSDs, while the issue at hand is >>> not with OSDs but rather monitors and their RocksDB store. Indeed, the >>> drives are not enterprise-grade, but their specs exceed Ceph hardware >>> recommendations by a good margin, they're being used as boot drives only >>> and aren't supposed to be written to continuously at high rates - which is >>> what unfortunately is happening. I am trying to determine why it is >>> happening and how the issue can be alleviated or resolved, unfortunately >>> monitor RocksDB usage and tunables appear to be not documented at all. >>> >>> /Z >>> >>> On Fri, 13 Oct 2023 at 20:11, Anthony D'Atri <anthony.datri@xxxxxxxxx> >>> wrote: >>> >>>> cf. Mark's article I sent you re RocksDB tuning. I suspect that with >>>> Reef you would experience fewer writes. Universal compaction might also >>>> help, but in the end this SSD is a client SKU and really not suited for >>>> enterprise use. If you had the 1TB SKU you'd get much longer life, or you >>>> could change the overprovisioning on the ones you have. >>>> >>>> On Oct 13, 2023, at 12:30, Zakhar Kirpichenko <zakhar@xxxxxxxxx> wrote: >>>> >>>> I would very much appreciate it if someone with a better understanding >>>> of >>>> monitor internals and use of RocksDB could please chip in. >>>> >>>> >>>> >>> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx