Do you use many snapshots (rbd or cephfs)? That can cause a heavy
monitor usage, we've seen large mon stores on customer clusters with
rbd mirroring on snapshot basis. In a healthy cluster they have mon
stores of around 2GB in size.
>> @Eugen: Was there not an option to limit logging to the MON store?
I don't recall at the moment, worth checking tough.
Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx>:
> Thank you, Frank.
>
> The cluster is healthy, operating normally, nothing unusual is going on.
We
> observe lots of writes by mon processes into mon rocksdb stores,
> specifically:
>
> /var/lib/ceph/mon/ceph-cephXX/store.db:
> 65M 3675511.sst
> 65M 3675512.sst
> 65M 3675513.sst
> 65M 3675514.sst
> 65M 3675515.sst
> 65M 3675516.sst
> 65M 3675517.sst
> 65M 3675518.sst
> 62M 3675519.sst
>
> The site of the files is not huge, but monitors rotate and write out
these
> files often, sometimes several times per minute, resulting in lots of
data
> written to disk. The writes coincide with "manual compaction" events
logged
> by the monitors, for example:
>
> debug 2023-10-11T11:10:10.483+0000 7f48a3a9b700 4 rocksdb:
> [compaction/compaction_job.cc:1676] [default] [JOB 70854] Compacting 1@5
+
> 9@6 files to L6, score -1.00
> debug 2023-10-11T11:10:10.483+0000 7f48a3a9b700 4 rocksdb: EVENT_LOG_v1
> {"time_micros": 1697022610487624, "job": 70854, "event":
> "compaction_started", "compaction_reason": "ManualCompaction",
"files_L5":
> [3675543], "files_L6": [3675533, 3675534, 3675535, 3675536, 3675537,
> 3675538, 3675539, 3675540, 3675541], "score": -1, "input_data_size":
> 601117031}
> debug 2023-10-11T11:10:10.619+0000 7f48a3a9b700 4 rocksdb:
> [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> #3675544: 2015 keys, 67287115 bytes
> debug 2023-10-11T11:10:10.763+0000 7f48a3a9b700 4 rocksdb:
> [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> #3675545: 24343 keys, 67336225 bytes
> debug 2023-10-11T11:10:10.899+0000 7f48a3a9b700 4 rocksdb:
> [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> #3675546: 1196 keys, 67225813 bytes
> debug 2023-10-11T11:10:11.035+0000 7f48a3a9b700 4 rocksdb:
> [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> #3675547: 1049 keys, 67252678 bytes
> debug 2023-10-11T11:10:11.167+0000 7f48a3a9b700 4 rocksdb:
> [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> #3675548: 1081 keys, 67216638 bytes
> debug 2023-10-11T11:10:11.303+0000 7f48a3a9b700 4 rocksdb:
> [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> #3675549: 1196 keys, 67245376 bytes
> debug 2023-10-11T11:10:12.023+0000 7f48a3a9b700 4 rocksdb:
> [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> #3675550: 1195 keys, 67246813 bytes
> debug 2023-10-11T11:10:13.059+0000 7f48a3a9b700 4 rocksdb:
> [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> #3675551: 1205 keys, 67223302 bytes
> debug 2023-10-11T11:10:13.903+0000 7f48a3a9b700 4 rocksdb:
> [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
> #3675552: 1312 keys, 56416011 bytes
> debug 2023-10-11T11:10:13.911+0000 7f48a3a9b700 4 rocksdb:
> [compaction/compaction_job.cc:1415] [default] [JOB 70854] Compacted 1@5
+
> 9@6 files to L6 => 594449971 bytes
> debug 2023-10-11T11:10:13.915+0000 7f48a3a9b700 4 rocksdb: (Original Log
> Time 2023/10/11-11:10:13.920991) [compaction/compaction_job.cc:760]
> [default] compacted to: base level 5 level multiplier 10.00 max bytes
base
> 268435456 files[0 0 0 0 0 0 9] max score 0.00, MB/sec: 175.8 rd, 173.9
wr,
> level 6, files in(1, 9) out(9) MB in(0.3, 572.9) out(566.9),
> read-write-amplify(3434.6) write-amplify(1707.7) OK, records in: 35108,
> records dropped: 516 output_compression: NoCompression
> debug 2023-10-11T11:10:13.915+0000 7f48a3a9b700 4 rocksdb: (Original Log
> Time 2023/10/11-11:10:13.921010) EVENT_LOG_v1 {"time_micros":
> 1697022613921002, "job": 70854, "event": "compaction_finished",
> "compaction_time_micros": 3418822, "compaction_time_cpu_micros": 785454,
> "output_level": 6, "num_output_files": 9, "total_output_size": 594449971,
> "num_input_records": 35108, "num_output_records": 34592,
> "num_subcompactions": 1, "output_compression": "NoCompression",
> "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0,
> "lsm_state": [0, 0, 0, 0, 0, 0, 9]}
>
> The log even mentions the huge write multiplication. I wonder whether
this
> is normal and what can be done about it.
>
> /Z
>
> On Wed, 11 Oct 2023 at 13:55, Frank Schilder <frans@xxxxxx> wrote:
>
>> I need to ask here: where exactly do you observe the hundreds of GB
>> written per day? Are the mon logs huge? Is it the mon store? Is your
>> cluster unhealthy?
>>
>> We have an octopus cluster with 1282 OSDs, 1650 ceph fs clients and
about
>> 800 librbd clients. Per week our mon logs are about 70M, the cluster
logs
>> about 120M , the audit logs about 70M and I see between 100-200Kb/s
writes
>> to the mon store. That's in the lower-digit GB range per day. Hundreds
of
>> GB per day sound completely over the top on a healthy cluster, unless
you
>> have MGR modules changing the OSD/cluster map continuously.
>>
>> Is autoscaler running and doing stuff?
>> Is balancer running and doing stuff?
>> Is backfill going on?
>> Is recovery going on?
>> Is your ceph version affected by the "excessive logging to MON store"
>> issue that was present starting with pacific but should have been
addressed
>> by now?
>>
>> @Eugen: Was there not an option to limit logging to the MON store?
>>
>> For information to readers, we followed old recommendations from a Dell
>> white paper for building a ceph cluster and have a 1TB Raid10 array on
6x
>> write intensive SSDs for the MON stores. After 5 years we are below 10%
>> wear. Average size of the MON store for a healthy cluster is 500M-1G,
but
>> we have seen this ballooning to 100+GB in degraded conditions.
>>
>> Best regards,
>> =================
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>>
>> ________________________________________
>> From: Zakhar Kirpichenko <zakhar@xxxxxxxxx>
>> Sent: Wednesday, October 11, 2023 12:00 PM
>> To: Eugen Block
>> Cc: ceph-users@xxxxxxx
>> Subject: Re: Ceph 16.2.x mon compactions, disk writes
>>
>> Thank you, Eugen.
>>
>> I'm interested specifically to find out whether the huge amount of data
>> written by monitors is expected. It is eating through the endurance of
our
>> system drives, which were not specced for high DWPD/TBW, as this is not
a
>> documented requirement, and monitors produce hundreds of gigabytes of
>> writes per day. I am looking for ways to reduce the amount of writes, if
>> possible.
>>
>> /Z
>>
>> On Wed, 11 Oct 2023 at 12:41, Eugen Block <eblock@xxxxxx> wrote:
>>
>> > Hi,
>> >
>> > what you report is the expected behaviour, at least I see the same on
>> > all clusters. I can't answer why the compaction is required that
>> > often, but you can control the log level of the rocksdb output:
>> >
>> > ceph config set mon debug_rocksdb 1/5 (default is 4/5)
>> >
>> > This reduces the log entries and you wouldn't see the manual
>> > compaction logs anymore. There are a couple more rocksdb options but I
>> > probably wouldn't change too much, only if you know what you're doing.
>> > Maybe Igor can comment if some other tuning makes sense here.
>> >
>> > Regards,
>> > Eugen
>> >
>> > Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx>:
>> >
>> > > Any input from anyone, please?
>> > >
>> > > On Tue, 10 Oct 2023 at 09:44, Zakhar Kirpichenko <zakhar@xxxxxxxxx>
>> > wrote:
>> > >
>> > >> Any input from anyone, please?
>> > >>
>> > >> It's another thing that seems to be rather poorly documented: it's
>> > unclear
>> > >> what to expect, what 'normal' behavior should be, and what can be
done
>> > >> about the huge amount of writes by monitors.
>> > >>
>> > >> /Z
>> > >>
>> > >> On Mon, 9 Oct 2023 at 12:40, Zakhar Kirpichenko <zakhar@xxxxxxxxx>
>> > wrote:
>> > >>
>> > >>> Hi,
>> > >>>
>> > >>> Monitors in our 16.2.14 cluster appear to quite often run "manual
>> > >>> compaction" tasks:
>> > >>>
>> > >>> debug 2023-10-09T09:30:53.888+0000 7f48a329a700 4 rocksdb:
>> > EVENT_LOG_v1
>> > >>> {"time_micros": 1696843853892760, "job": 64225, "event":
>> > "flush_started",
>> > >>> "num_memtables": 1, "num_entries": 715, "num_deletes": 251,
>> > >>> "total_data_size": 3870352, "memory_usage": 3886744,
"flush_reason":
>> > >>> "Manual Compaction"}
>> > >>> debug 2023-10-09T09:30:53.904+0000 7f4899286700 4 rocksdb:
>> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
>> compaction
>> > >>> starting
>> > >>> debug 2023-10-09T09:30:53.908+0000 7f48a3a9b700 4 rocksdb:
(Original
>> > Log
>> > >>> Time 2023/10/09-09:30:53.910204)
>> > [db_impl/db_impl_compaction_flush.cc:2516]
>> > >>> [default] Manual compaction from level-0 to level-5 from 'paxos ..
>> > 'paxos;
>> > >>> will stop at (end)
>> > >>> debug 2023-10-09T09:30:53.908+0000 7f4899286700 4 rocksdb:
>> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
>> compaction
>> > >>> starting
>> > >>> debug 2023-10-09T09:30:53.908+0000 7f4899286700 4 rocksdb:
>> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
>> compaction
>> > >>> starting
>> > >>> debug 2023-10-09T09:30:53.908+0000 7f4899286700 4 rocksdb:
>> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
>> compaction
>> > >>> starting
>> > >>> debug 2023-10-09T09:30:53.908+0000 7f4899286700 4 rocksdb:
>> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
>> compaction
>> > >>> starting
>> > >>> debug 2023-10-09T09:30:53.908+0000 7f4899286700 4 rocksdb:
>> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
>> compaction
>> > >>> starting
>> > >>> debug 2023-10-09T09:30:53.908+0000 7f48a3a9b700 4 rocksdb:
(Original
>> > Log
>> > >>> Time 2023/10/09-09:30:53.911004)
>> > [db_impl/db_impl_compaction_flush.cc:2516]
>> > >>> [default] Manual compaction from level-5 to level-6 from 'paxos ..
>> > 'paxos;
>> > >>> will stop at (end)
>> > >>> debug 2023-10-09T09:32:08.956+0000 7f48a329a700 4 rocksdb:
>> > EVENT_LOG_v1
>> > >>> {"time_micros": 1696843928961390, "job": 64228, "event":
>> > "flush_started",
>> > >>> "num_memtables": 1, "num_entries": 1580, "num_deletes": 502,
>> > >>> "total_data_size": 8404605, "memory_usage": 8465840,
"flush_reason":
>> > >>> "Manual Compaction"}
>> > >>> debug 2023-10-09T09:32:08.972+0000 7f4899286700 4 rocksdb:
>> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
>> compaction
>> > >>> starting
>> > >>> debug 2023-10-09T09:32:08.976+0000 7f48a3a9b700 4 rocksdb:
(Original
>> > Log
>> > >>> Time 2023/10/09-09:32:08.977739)
>> > [db_impl/db_impl_compaction_flush.cc:2516]
>> > >>> [default] Manual compaction from level-0 to level-5 from 'logm ..
>> > 'logm;
>> > >>> will stop at (end)
>> > >>> debug 2023-10-09T09:32:08.976+0000 7f4899286700 4 rocksdb:
>> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
>> compaction
>> > >>> starting
>> > >>> debug 2023-10-09T09:32:08.976+0000 7f4899286700 4 rocksdb:
>> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
>> compaction
>> > >>> starting
>> > >>> debug 2023-10-09T09:32:08.976+0000 7f4899286700 4 rocksdb:
>> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
>> compaction
>> > >>> starting
>> > >>> debug 2023-10-09T09:32:08.976+0000 7f4899286700 4 rocksdb:
>> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
>> compaction
>> > >>> starting
>> > >>> debug 2023-10-09T09:32:08.976+0000 7f4899286700 4 rocksdb:
>> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
>> compaction
>> > >>> starting
>> > >>> debug 2023-10-09T09:32:08.976+0000 7f48a3a9b700 4 rocksdb:
(Original
>> > Log
>> > >>> Time 2023/10/09-09:32:08.978512)
>> > [db_impl/db_impl_compaction_flush.cc:2516]
>> > >>> [default] Manual compaction from level-5 to level-6 from 'logm ..
>> > 'logm;
>> > >>> will stop at (end)
>> > >>> debug 2023-10-09T09:32:12.764+0000 7f4899286700 4 rocksdb:
>> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
>> compaction
>> > >>> starting
>> > >>> debug 2023-10-09T09:32:12.764+0000 7f4899286700 4 rocksdb:
>> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
>> compaction
>> > >>> starting
>> > >>> debug 2023-10-09T09:32:12.764+0000 7f4899286700 4 rocksdb:
>> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
>> compaction
>> > >>> starting
>> > >>> debug 2023-10-09T09:32:12.764+0000 7f4899286700 4 rocksdb:
>> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
>> compaction
>> > >>> starting
>> > >>> debug 2023-10-09T09:32:12.764+0000 7f4899286700 4 rocksdb:
>> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
>> compaction
>> > >>> starting
>> > >>> debug 2023-10-09T09:32:12.764+0000 7f4899286700 4 rocksdb:
>> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
>> compaction
>> > >>> starting
>> > >>> debug 2023-10-09T09:33:29.028+0000 7f48a329a700 4 rocksdb:
>> > EVENT_LOG_v1
>> > >>> {"time_micros": 1696844009033151, "job": 64231, "event":
>> > "flush_started",
>> > >>> "num_memtables": 1, "num_entries": 1430, "num_deletes": 251,
>> > >>> "total_data_size": 8975535, "memory_usage": 9035920,
"flush_reason":
>> > >>> "Manual Compaction"}
>> > >>> debug 2023-10-09T09:33:29.044+0000 7f4899286700 4 rocksdb:
>> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
>> compaction
>> > >>> starting
>> > >>> debug 2023-10-09T09:33:29.048+0000 7f48a3a9b700 4 rocksdb:
(Original
>> > Log
>> > >>> Time 2023/10/09-09:33:29.049585)
>> > [db_impl/db_impl_compaction_flush.cc:2516]
>> > >>> [default] Manual compaction from level-0 to level-5 from 'paxos ..
>> > 'paxos;
>> > >>> will stop at (end)
>> > >>> debug 2023-10-09T09:33:29.048+0000 7f4899286700 4 rocksdb:
>> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
>> compaction
>> > >>> starting
>> > >>> debug 2023-10-09T09:33:29.048+0000 7f4899286700 4 rocksdb:
>> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
>> compaction
>> > >>> starting
>> > >>> debug 2023-10-09T09:33:29.048+0000 7f4899286700 4 rocksdb:
>> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
>> compaction
>> > >>> starting
>> > >>> debug 2023-10-09T09:33:29.048+0000 7f4899286700 4 rocksdb:
>> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
>> compaction
>> > >>> starting
>> > >>> debug 2023-10-09T09:33:29.048+0000 7f4899286700 4 rocksdb:
>> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
>> compaction
>> > >>> starting
>> > >>> debug 2023-10-09T09:33:29.048+0000 7f48a3a9b700 4 rocksdb:
(Original
>> > Log
>> > >>> Time 2023/10/09-09:33:29.050355)
>> > [db_impl/db_impl_compaction_flush.cc:2516]
>> > >>> [default] Manual compaction from level-5 to level-6 from 'paxos ..
>> > 'paxos;
>> > >>> will stop at (end)
>> > >>>
>> > >>> I have removed a lot of interim log messages to save space.
>> > >>>
>> > >>> During each compaction the monitor process writes approximately
>> 500-600
>> > >>> MB of data to disk over a short period of time. These writes add
up
>> to
>> > tens
>> > >>> of gigabytes per hour and hundreds of gigabytes per day.
>> > >>>
>> > >>> Monitor rocksdb and compaction options are default:
>> > >>>
>> > >>> "mon_compact_on_bootstrap": "false",
>> > >>> "mon_compact_on_start": "false",
>> > >>> "mon_compact_on_trim": "true",
>> > >>> "mon_rocksdb_options":
>> > >>>
>> >
>>
"write_buffer_size=33554432,compression=kNoCompression,level_compaction_dynamic_level_bytes=true",
>> > >>>
>> > >>> Is this expected behavior? Is this something I can adjust in
order to
>> > >>> extend the system storage life?
>> > >>>
>> > >>> Best regards,
>> > >>> Zakhar
>> > >>>
>> > >>
>> > > _______________________________________________
>> > > ceph-users mailing list -- ceph-users@xxxxxxx
>> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users@xxxxxxx
>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> >
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>