We don't use CephFS at all and don't have RBD snapshots apart from some cloning for Openstack images. The size of mon stores isn't an issue, it's < 600 MB. But it gets overwritten often causing lots of disk writes, and that is an issue for us. /Z On Wed, 11 Oct 2023 at 14:37, Eugen Block <eblock@xxxxxx> wrote: > Do you use many snapshots (rbd or cephfs)? That can cause a heavy > monitor usage, we've seen large mon stores on customer clusters with > rbd mirroring on snapshot basis. In a healthy cluster they have mon > stores of around 2GB in size. > > >> @Eugen: Was there not an option to limit logging to the MON store? > > I don't recall at the moment, worth checking tough. > > Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx>: > > > Thank you, Frank. > > > > The cluster is healthy, operating normally, nothing unusual is going on. > We > > observe lots of writes by mon processes into mon rocksdb stores, > > specifically: > > > > /var/lib/ceph/mon/ceph-cephXX/store.db: > > 65M 3675511.sst > > 65M 3675512.sst > > 65M 3675513.sst > > 65M 3675514.sst > > 65M 3675515.sst > > 65M 3675516.sst > > 65M 3675517.sst > > 65M 3675518.sst > > 62M 3675519.sst > > > > The site of the files is not huge, but monitors rotate and write out > these > > files often, sometimes several times per minute, resulting in lots of > data > > written to disk. The writes coincide with "manual compaction" events > logged > > by the monitors, for example: > > > > debug 2023-10-11T11:10:10.483+0000 7f48a3a9b700 4 rocksdb: > > [compaction/compaction_job.cc:1676] [default] [JOB 70854] Compacting 1@5 > + > > 9@6 files to L6, score -1.00 > > debug 2023-10-11T11:10:10.483+0000 7f48a3a9b700 4 rocksdb: EVENT_LOG_v1 > > {"time_micros": 1697022610487624, "job": 70854, "event": > > "compaction_started", "compaction_reason": "ManualCompaction", > "files_L5": > > [3675543], "files_L6": [3675533, 3675534, 3675535, 3675536, 3675537, > > 3675538, 3675539, 3675540, 3675541], "score": -1, "input_data_size": > > 601117031} > > debug 2023-10-11T11:10:10.619+0000 7f48a3a9b700 4 rocksdb: > > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table > > #3675544: 2015 keys, 67287115 bytes > > debug 2023-10-11T11:10:10.763+0000 7f48a3a9b700 4 rocksdb: > > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table > > #3675545: 24343 keys, 67336225 bytes > > debug 2023-10-11T11:10:10.899+0000 7f48a3a9b700 4 rocksdb: > > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table > > #3675546: 1196 keys, 67225813 bytes > > debug 2023-10-11T11:10:11.035+0000 7f48a3a9b700 4 rocksdb: > > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table > > #3675547: 1049 keys, 67252678 bytes > > debug 2023-10-11T11:10:11.167+0000 7f48a3a9b700 4 rocksdb: > > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table > > #3675548: 1081 keys, 67216638 bytes > > debug 2023-10-11T11:10:11.303+0000 7f48a3a9b700 4 rocksdb: > > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table > > #3675549: 1196 keys, 67245376 bytes > > debug 2023-10-11T11:10:12.023+0000 7f48a3a9b700 4 rocksdb: > > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table > > #3675550: 1195 keys, 67246813 bytes > > debug 2023-10-11T11:10:13.059+0000 7f48a3a9b700 4 rocksdb: > > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table > > #3675551: 1205 keys, 67223302 bytes > > debug 2023-10-11T11:10:13.903+0000 7f48a3a9b700 4 rocksdb: > > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table > > #3675552: 1312 keys, 56416011 bytes > > debug 2023-10-11T11:10:13.911+0000 7f48a3a9b700 4 rocksdb: > > [compaction/compaction_job.cc:1415] [default] [JOB 70854] Compacted 1@5 > + > > 9@6 files to L6 => 594449971 bytes > > debug 2023-10-11T11:10:13.915+0000 7f48a3a9b700 4 rocksdb: (Original Log > > Time 2023/10/11-11:10:13.920991) [compaction/compaction_job.cc:760] > > [default] compacted to: base level 5 level multiplier 10.00 max bytes > base > > 268435456 files[0 0 0 0 0 0 9] max score 0.00, MB/sec: 175.8 rd, 173.9 > wr, > > level 6, files in(1, 9) out(9) MB in(0.3, 572.9) out(566.9), > > read-write-amplify(3434.6) write-amplify(1707.7) OK, records in: 35108, > > records dropped: 516 output_compression: NoCompression > > debug 2023-10-11T11:10:13.915+0000 7f48a3a9b700 4 rocksdb: (Original Log > > Time 2023/10/11-11:10:13.921010) EVENT_LOG_v1 {"time_micros": > > 1697022613921002, "job": 70854, "event": "compaction_finished", > > "compaction_time_micros": 3418822, "compaction_time_cpu_micros": 785454, > > "output_level": 6, "num_output_files": 9, "total_output_size": 594449971, > > "num_input_records": 35108, "num_output_records": 34592, > > "num_subcompactions": 1, "output_compression": "NoCompression", > > "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0, > > "lsm_state": [0, 0, 0, 0, 0, 0, 9]} > > > > The log even mentions the huge write multiplication. I wonder whether > this > > is normal and what can be done about it. > > > > /Z > > > > On Wed, 11 Oct 2023 at 13:55, Frank Schilder <frans@xxxxxx> wrote: > > > >> I need to ask here: where exactly do you observe the hundreds of GB > >> written per day? Are the mon logs huge? Is it the mon store? Is your > >> cluster unhealthy? > >> > >> We have an octopus cluster with 1282 OSDs, 1650 ceph fs clients and > about > >> 800 librbd clients. Per week our mon logs are about 70M, the cluster > logs > >> about 120M , the audit logs about 70M and I see between 100-200Kb/s > writes > >> to the mon store. That's in the lower-digit GB range per day. Hundreds > of > >> GB per day sound completely over the top on a healthy cluster, unless > you > >> have MGR modules changing the OSD/cluster map continuously. > >> > >> Is autoscaler running and doing stuff? > >> Is balancer running and doing stuff? > >> Is backfill going on? > >> Is recovery going on? > >> Is your ceph version affected by the "excessive logging to MON store" > >> issue that was present starting with pacific but should have been > addressed > >> by now? > >> > >> @Eugen: Was there not an option to limit logging to the MON store? > >> > >> For information to readers, we followed old recommendations from a Dell > >> white paper for building a ceph cluster and have a 1TB Raid10 array on > 6x > >> write intensive SSDs for the MON stores. After 5 years we are below 10% > >> wear. Average size of the MON store for a healthy cluster is 500M-1G, > but > >> we have seen this ballooning to 100+GB in degraded conditions. > >> > >> Best regards, > >> ================= > >> Frank Schilder > >> AIT Risø Campus > >> Bygning 109, rum S14 > >> > >> ________________________________________ > >> From: Zakhar Kirpichenko <zakhar@xxxxxxxxx> > >> Sent: Wednesday, October 11, 2023 12:00 PM > >> To: Eugen Block > >> Cc: ceph-users@xxxxxxx > >> Subject: Re: Ceph 16.2.x mon compactions, disk writes > >> > >> Thank you, Eugen. > >> > >> I'm interested specifically to find out whether the huge amount of data > >> written by monitors is expected. It is eating through the endurance of > our > >> system drives, which were not specced for high DWPD/TBW, as this is not > a > >> documented requirement, and monitors produce hundreds of gigabytes of > >> writes per day. I am looking for ways to reduce the amount of writes, if > >> possible. > >> > >> /Z > >> > >> On Wed, 11 Oct 2023 at 12:41, Eugen Block <eblock@xxxxxx> wrote: > >> > >> > Hi, > >> > > >> > what you report is the expected behaviour, at least I see the same on > >> > all clusters. I can't answer why the compaction is required that > >> > often, but you can control the log level of the rocksdb output: > >> > > >> > ceph config set mon debug_rocksdb 1/5 (default is 4/5) > >> > > >> > This reduces the log entries and you wouldn't see the manual > >> > compaction logs anymore. There are a couple more rocksdb options but I > >> > probably wouldn't change too much, only if you know what you're doing. > >> > Maybe Igor can comment if some other tuning makes sense here. > >> > > >> > Regards, > >> > Eugen > >> > > >> > Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx>: > >> > > >> > > Any input from anyone, please? > >> > > > >> > > On Tue, 10 Oct 2023 at 09:44, Zakhar Kirpichenko <zakhar@xxxxxxxxx> > >> > wrote: > >> > > > >> > >> Any input from anyone, please? > >> > >> > >> > >> It's another thing that seems to be rather poorly documented: it's > >> > unclear > >> > >> what to expect, what 'normal' behavior should be, and what can be > done > >> > >> about the huge amount of writes by monitors. > >> > >> > >> > >> /Z > >> > >> > >> > >> On Mon, 9 Oct 2023 at 12:40, Zakhar Kirpichenko <zakhar@xxxxxxxxx> > >> > wrote: > >> > >> > >> > >>> Hi, > >> > >>> > >> > >>> Monitors in our 16.2.14 cluster appear to quite often run "manual > >> > >>> compaction" tasks: > >> > >>> > >> > >>> debug 2023-10-09T09:30:53.888+0000 7f48a329a700 4 rocksdb: > >> > EVENT_LOG_v1 > >> > >>> {"time_micros": 1696843853892760, "job": 64225, "event": > >> > "flush_started", > >> > >>> "num_memtables": 1, "num_entries": 715, "num_deletes": 251, > >> > >>> "total_data_size": 3870352, "memory_usage": 3886744, > "flush_reason": > >> > >>> "Manual Compaction"} > >> > >>> debug 2023-10-09T09:30:53.904+0000 7f4899286700 4 rocksdb: > >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual > >> compaction > >> > >>> starting > >> > >>> debug 2023-10-09T09:30:53.908+0000 7f48a3a9b700 4 rocksdb: > (Original > >> > Log > >> > >>> Time 2023/10/09-09:30:53.910204) > >> > [db_impl/db_impl_compaction_flush.cc:2516] > >> > >>> [default] Manual compaction from level-0 to level-5 from 'paxos .. > >> > 'paxos; > >> > >>> will stop at (end) > >> > >>> debug 2023-10-09T09:30:53.908+0000 7f4899286700 4 rocksdb: > >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual > >> compaction > >> > >>> starting > >> > >>> debug 2023-10-09T09:30:53.908+0000 7f4899286700 4 rocksdb: > >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual > >> compaction > >> > >>> starting > >> > >>> debug 2023-10-09T09:30:53.908+0000 7f4899286700 4 rocksdb: > >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual > >> compaction > >> > >>> starting > >> > >>> debug 2023-10-09T09:30:53.908+0000 7f4899286700 4 rocksdb: > >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual > >> compaction > >> > >>> starting > >> > >>> debug 2023-10-09T09:30:53.908+0000 7f4899286700 4 rocksdb: > >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual > >> compaction > >> > >>> starting > >> > >>> debug 2023-10-09T09:30:53.908+0000 7f48a3a9b700 4 rocksdb: > (Original > >> > Log > >> > >>> Time 2023/10/09-09:30:53.911004) > >> > [db_impl/db_impl_compaction_flush.cc:2516] > >> > >>> [default] Manual compaction from level-5 to level-6 from 'paxos .. > >> > 'paxos; > >> > >>> will stop at (end) > >> > >>> debug 2023-10-09T09:32:08.956+0000 7f48a329a700 4 rocksdb: > >> > EVENT_LOG_v1 > >> > >>> {"time_micros": 1696843928961390, "job": 64228, "event": > >> > "flush_started", > >> > >>> "num_memtables": 1, "num_entries": 1580, "num_deletes": 502, > >> > >>> "total_data_size": 8404605, "memory_usage": 8465840, > "flush_reason": > >> > >>> "Manual Compaction"} > >> > >>> debug 2023-10-09T09:32:08.972+0000 7f4899286700 4 rocksdb: > >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual > >> compaction > >> > >>> starting > >> > >>> debug 2023-10-09T09:32:08.976+0000 7f48a3a9b700 4 rocksdb: > (Original > >> > Log > >> > >>> Time 2023/10/09-09:32:08.977739) > >> > [db_impl/db_impl_compaction_flush.cc:2516] > >> > >>> [default] Manual compaction from level-0 to level-5 from 'logm .. > >> > 'logm; > >> > >>> will stop at (end) > >> > >>> debug 2023-10-09T09:32:08.976+0000 7f4899286700 4 rocksdb: > >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual > >> compaction > >> > >>> starting > >> > >>> debug 2023-10-09T09:32:08.976+0000 7f4899286700 4 rocksdb: > >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual > >> compaction > >> > >>> starting > >> > >>> debug 2023-10-09T09:32:08.976+0000 7f4899286700 4 rocksdb: > >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual > >> compaction > >> > >>> starting > >> > >>> debug 2023-10-09T09:32:08.976+0000 7f4899286700 4 rocksdb: > >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual > >> compaction > >> > >>> starting > >> > >>> debug 2023-10-09T09:32:08.976+0000 7f4899286700 4 rocksdb: > >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual > >> compaction > >> > >>> starting > >> > >>> debug 2023-10-09T09:32:08.976+0000 7f48a3a9b700 4 rocksdb: > (Original > >> > Log > >> > >>> Time 2023/10/09-09:32:08.978512) > >> > [db_impl/db_impl_compaction_flush.cc:2516] > >> > >>> [default] Manual compaction from level-5 to level-6 from 'logm .. > >> > 'logm; > >> > >>> will stop at (end) > >> > >>> debug 2023-10-09T09:32:12.764+0000 7f4899286700 4 rocksdb: > >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual > >> compaction > >> > >>> starting > >> > >>> debug 2023-10-09T09:32:12.764+0000 7f4899286700 4 rocksdb: > >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual > >> compaction > >> > >>> starting > >> > >>> debug 2023-10-09T09:32:12.764+0000 7f4899286700 4 rocksdb: > >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual > >> compaction > >> > >>> starting > >> > >>> debug 2023-10-09T09:32:12.764+0000 7f4899286700 4 rocksdb: > >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual > >> compaction > >> > >>> starting > >> > >>> debug 2023-10-09T09:32:12.764+0000 7f4899286700 4 rocksdb: > >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual > >> compaction > >> > >>> starting > >> > >>> debug 2023-10-09T09:32:12.764+0000 7f4899286700 4 rocksdb: > >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual > >> compaction > >> > >>> starting > >> > >>> debug 2023-10-09T09:33:29.028+0000 7f48a329a700 4 rocksdb: > >> > EVENT_LOG_v1 > >> > >>> {"time_micros": 1696844009033151, "job": 64231, "event": > >> > "flush_started", > >> > >>> "num_memtables": 1, "num_entries": 1430, "num_deletes": 251, > >> > >>> "total_data_size": 8975535, "memory_usage": 9035920, > "flush_reason": > >> > >>> "Manual Compaction"} > >> > >>> debug 2023-10-09T09:33:29.044+0000 7f4899286700 4 rocksdb: > >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual > >> compaction > >> > >>> starting > >> > >>> debug 2023-10-09T09:33:29.048+0000 7f48a3a9b700 4 rocksdb: > (Original > >> > Log > >> > >>> Time 2023/10/09-09:33:29.049585) > >> > [db_impl/db_impl_compaction_flush.cc:2516] > >> > >>> [default] Manual compaction from level-0 to level-5 from 'paxos .. > >> > 'paxos; > >> > >>> will stop at (end) > >> > >>> debug 2023-10-09T09:33:29.048+0000 7f4899286700 4 rocksdb: > >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual > >> compaction > >> > >>> starting > >> > >>> debug 2023-10-09T09:33:29.048+0000 7f4899286700 4 rocksdb: > >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual > >> compaction > >> > >>> starting > >> > >>> debug 2023-10-09T09:33:29.048+0000 7f4899286700 4 rocksdb: > >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual > >> compaction > >> > >>> starting > >> > >>> debug 2023-10-09T09:33:29.048+0000 7f4899286700 4 rocksdb: > >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual > >> compaction > >> > >>> starting > >> > >>> debug 2023-10-09T09:33:29.048+0000 7f4899286700 4 rocksdb: > >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual > >> compaction > >> > >>> starting > >> > >>> debug 2023-10-09T09:33:29.048+0000 7f48a3a9b700 4 rocksdb: > (Original > >> > Log > >> > >>> Time 2023/10/09-09:33:29.050355) > >> > [db_impl/db_impl_compaction_flush.cc:2516] > >> > >>> [default] Manual compaction from level-5 to level-6 from 'paxos .. > >> > 'paxos; > >> > >>> will stop at (end) > >> > >>> > >> > >>> I have removed a lot of interim log messages to save space. > >> > >>> > >> > >>> During each compaction the monitor process writes approximately > >> 500-600 > >> > >>> MB of data to disk over a short period of time. These writes add > up > >> to > >> > tens > >> > >>> of gigabytes per hour and hundreds of gigabytes per day. > >> > >>> > >> > >>> Monitor rocksdb and compaction options are default: > >> > >>> > >> > >>> "mon_compact_on_bootstrap": "false", > >> > >>> "mon_compact_on_start": "false", > >> > >>> "mon_compact_on_trim": "true", > >> > >>> "mon_rocksdb_options": > >> > >>> > >> > > >> > "write_buffer_size=33554432,compression=kNoCompression,level_compaction_dynamic_level_bytes=true", > >> > >>> > >> > >>> Is this expected behavior? Is this something I can adjust in > order to > >> > >>> extend the system storage life? > >> > >>> > >> > >>> Best regards, > >> > >>> Zakhar > >> > >>> > >> > >> > >> > > _______________________________________________ > >> > > ceph-users mailing list -- ceph-users@xxxxxxx > >> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> > > >> > > >> > _______________________________________________ > >> > ceph-users mailing list -- ceph-users@xxxxxxx > >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> > > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users@xxxxxxx > >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > >> > > > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx