Re: Upgraded 16.2.14 to 16.2.15

Eugen Block <eblock@xxxxxx> · Tue, 05 Mar 2024 07:57:45 +0000

Hi,

1. RocksDB options, which I provided to each mon via their configuration
files, got overwritten during mon redeployment and I had to re-add
mon_rocksdb_options back.

IIRC, you didn't use the extra_entrypoint_args for that option but  
added it directly to the container unit.run file. So it's expected  
that it's removed after an update. If you want it to persist a  
container update you should consider using the extra_entrypoint_args:

cat mon.yaml
service_type: mon
service_name: mon
placement:
  hosts:
  - host1
  - host2
  - host3
extra_entrypoint_args:
  -  
'--mon-rocksdb-options=write_buffer_size=33554432,compression=kLZ4Compression,level_compaction_dynamic_level_bytes=true,bottommost_compression=kLZ4HCCompression,max_background_jobs=4,max_subcompactions=2'

Regards,
Eugen

Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx>:

Hi,

I have upgraded my test and production cephadm-managed clusters from
16.2.14 to 16.2.15. The upgrade was smooth and completed without issues.
There were a few things which I noticed after each upgrade:

1. RocksDB options, which I provided to each mon via their configuration
files, got overwritten during mon redeployment and I had to re-add
mon_rocksdb_options back.

2. Monitor debug_rocksdb option got silently reset back to the default 4/5,
I had to set it back to 1/5.

3. For roughly 2 hours after the upgrade, despite the clusters being
healthy and operating normally, all monitors would run manual compactions
very often and write to disks at very high rates. For example, production
monitors had their rocksdb:low0 thread write to store.db:

monitors without RocksDB compression: ~8 GB/5 min, or ~96 GB/hour;
monitors with RocksDB compression: ~1.5 GB/5 min, or ~18 GB/hour.

After roughly 2 hours with no changes to the cluster the write rates
dropped to ~0.4-0.6 GB/5 min and ~120 MB/5 min respectively. The reason for
frequent manual compactions and high write rates wasn't immediately
apparent.

4. Crash deployment broke ownership of /var/lib/ceph/FSID/crash
/var/lib/ceph/FSID/crash/posted, despite I already fixed it manually after
the upgrade to 16.2.14 which had broken it as well.

5. Mgr RAM usage appears to be increasing at a slower rate than it did with
16.2.14, although it's too early to tell whether the issue with mgrs
randomly consuming all RAM and getting OOM-killed has been fixed - with
16.2.14 this would normally take several days.

Overall, things look good. Thanks to the Ceph team for this release!

Zakhar
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx