Re: Ceph 16.2.x mon compactions, disk writes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thank you, Frank.

The cluster is healthy, operating normally, nothing unusual is going on. We
observe lots of writes by mon processes into mon rocksdb stores,
specifically:

/var/lib/ceph/mon/ceph-cephXX/store.db:
65M     3675511.sst
65M     3675512.sst
65M     3675513.sst
65M     3675514.sst
65M     3675515.sst
65M     3675516.sst
65M     3675517.sst
65M     3675518.sst
62M     3675519.sst

The site of the files is not huge, but monitors rotate and write out these
files often, sometimes several times per minute, resulting in lots of data
written to disk. The writes coincide with "manual compaction" events logged
by the monitors, for example:

debug 2023-10-11T11:10:10.483+0000 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1676] [default] [JOB 70854] Compacting 1@5 +
9@6 files to L6, score -1.00
debug 2023-10-11T11:10:10.483+0000 7f48a3a9b700  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1697022610487624, "job": 70854, "event":
"compaction_started", "compaction_reason": "ManualCompaction", "files_L5":
[3675543], "files_L6": [3675533, 3675534, 3675535, 3675536, 3675537,
3675538, 3675539, 3675540, 3675541], "score": -1, "input_data_size":
601117031}
debug 2023-10-11T11:10:10.619+0000 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
#3675544: 2015 keys, 67287115 bytes
debug 2023-10-11T11:10:10.763+0000 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
#3675545: 24343 keys, 67336225 bytes
debug 2023-10-11T11:10:10.899+0000 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
#3675546: 1196 keys, 67225813 bytes
debug 2023-10-11T11:10:11.035+0000 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
#3675547: 1049 keys, 67252678 bytes
debug 2023-10-11T11:10:11.167+0000 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
#3675548: 1081 keys, 67216638 bytes
debug 2023-10-11T11:10:11.303+0000 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
#3675549: 1196 keys, 67245376 bytes
debug 2023-10-11T11:10:12.023+0000 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
#3675550: 1195 keys, 67246813 bytes
debug 2023-10-11T11:10:13.059+0000 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
#3675551: 1205 keys, 67223302 bytes
debug 2023-10-11T11:10:13.903+0000 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated table
#3675552: 1312 keys, 56416011 bytes
debug 2023-10-11T11:10:13.911+0000 7f48a3a9b700  4 rocksdb:
[compaction/compaction_job.cc:1415] [default] [JOB 70854] Compacted 1@5 +
9@6 files to L6 => 594449971 bytes
debug 2023-10-11T11:10:13.915+0000 7f48a3a9b700  4 rocksdb: (Original Log
Time 2023/10/11-11:10:13.920991) [compaction/compaction_job.cc:760]
[default] compacted to: base level 5 level multiplier 10.00 max bytes base
268435456 files[0 0 0 0 0 0 9] max score 0.00, MB/sec: 175.8 rd, 173.9 wr,
level 6, files in(1, 9) out(9) MB in(0.3, 572.9) out(566.9),
read-write-amplify(3434.6) write-amplify(1707.7) OK, records in: 35108,
records dropped: 516 output_compression: NoCompression
debug 2023-10-11T11:10:13.915+0000 7f48a3a9b700  4 rocksdb: (Original Log
Time 2023/10/11-11:10:13.921010) EVENT_LOG_v1 {"time_micros":
1697022613921002, "job": 70854, "event": "compaction_finished",
"compaction_time_micros": 3418822, "compaction_time_cpu_micros": 785454,
"output_level": 6, "num_output_files": 9, "total_output_size": 594449971,
"num_input_records": 35108, "num_output_records": 34592,
"num_subcompactions": 1, "output_compression": "NoCompression",
"num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0,
"lsm_state": [0, 0, 0, 0, 0, 0, 9]}

The log even mentions the huge write multiplication. I wonder whether this
is normal and what can be done about it.

/Z

On Wed, 11 Oct 2023 at 13:55, Frank Schilder <frans@xxxxxx> wrote:

> I need to ask here: where exactly do you observe the hundreds of GB
> written per day? Are the mon logs huge? Is it the mon store? Is your
> cluster unhealthy?
>
> We have an octopus cluster with 1282 OSDs, 1650 ceph fs clients and about
> 800 librbd clients. Per week our mon logs are  about 70M, the cluster logs
> about 120M , the audit logs about 70M and I see between 100-200Kb/s writes
> to the mon store. That's in the lower-digit GB range per day. Hundreds of
> GB per day sound completely over the top on a healthy cluster, unless you
> have MGR modules changing the OSD/cluster map continuously.
>
> Is autoscaler running and doing stuff?
> Is balancer running and doing stuff?
> Is backfill going on?
> Is recovery going on?
> Is your ceph version affected by the "excessive logging to MON store"
> issue that was present starting with pacific but should have been addressed
> by now?
>
> @Eugen: Was there not an option to limit logging to the MON store?
>
> For information to readers, we followed old recommendations from a Dell
> white paper for building a ceph cluster and have a 1TB Raid10 array on 6x
> write intensive SSDs for the MON stores. After 5 years we are below 10%
> wear. Average size of the MON store for a healthy cluster is 500M-1G, but
> we have seen this ballooning to 100+GB in degraded conditions.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Zakhar Kirpichenko <zakhar@xxxxxxxxx>
> Sent: Wednesday, October 11, 2023 12:00 PM
> To: Eugen Block
> Cc: ceph-users@xxxxxxx
> Subject:  Re: Ceph 16.2.x mon compactions, disk writes
>
> Thank you, Eugen.
>
> I'm interested specifically to find out whether the huge amount of data
> written by monitors is expected. It is eating through the endurance of our
> system drives, which were not specced for high DWPD/TBW, as this is not a
> documented requirement, and monitors produce hundreds of gigabytes of
> writes per day. I am looking for ways to reduce the amount of writes, if
> possible.
>
> /Z
>
> On Wed, 11 Oct 2023 at 12:41, Eugen Block <eblock@xxxxxx> wrote:
>
> > Hi,
> >
> > what you report is the expected behaviour, at least I see the same on
> > all clusters. I can't answer why the compaction is required that
> > often, but you can control the log level of the rocksdb output:
> >
> > ceph config set mon debug_rocksdb 1/5 (default is 4/5)
> >
> > This reduces the log entries and you wouldn't see the manual
> > compaction logs anymore. There are a couple more rocksdb options but I
> > probably wouldn't change too much, only if you know what you're doing.
> > Maybe Igor can comment if some other tuning makes sense here.
> >
> > Regards,
> > Eugen
> >
> > Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx>:
> >
> > > Any input from anyone, please?
> > >
> > > On Tue, 10 Oct 2023 at 09:44, Zakhar Kirpichenko <zakhar@xxxxxxxxx>
> > wrote:
> > >
> > >> Any input from anyone, please?
> > >>
> > >> It's another thing that seems to be rather poorly documented: it's
> > unclear
> > >> what to expect, what 'normal' behavior should be, and what can be done
> > >> about the huge amount of writes by monitors.
> > >>
> > >> /Z
> > >>
> > >> On Mon, 9 Oct 2023 at 12:40, Zakhar Kirpichenko <zakhar@xxxxxxxxx>
> > wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> Monitors in our 16.2.14 cluster appear to quite often run "manual
> > >>> compaction" tasks:
> > >>>
> > >>> debug 2023-10-09T09:30:53.888+0000 7f48a329a700  4 rocksdb:
> > EVENT_LOG_v1
> > >>> {"time_micros": 1696843853892760, "job": 64225, "event":
> > "flush_started",
> > >>> "num_memtables": 1, "num_entries": 715, "num_deletes": 251,
> > >>> "total_data_size": 3870352, "memory_usage": 3886744, "flush_reason":
> > >>> "Manual Compaction"}
> > >>> debug 2023-10-09T09:30:53.904+0000 7f4899286700  4 rocksdb:
> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
> compaction
> > >>> starting
> > >>> debug 2023-10-09T09:30:53.908+0000 7f48a3a9b700  4 rocksdb: (Original
> > Log
> > >>> Time 2023/10/09-09:30:53.910204)
> > [db_impl/db_impl_compaction_flush.cc:2516]
> > >>> [default] Manual compaction from level-0 to level-5 from 'paxos ..
> > 'paxos;
> > >>> will stop at (end)
> > >>> debug 2023-10-09T09:30:53.908+0000 7f4899286700  4 rocksdb:
> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
> compaction
> > >>> starting
> > >>> debug 2023-10-09T09:30:53.908+0000 7f4899286700  4 rocksdb:
> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
> compaction
> > >>> starting
> > >>> debug 2023-10-09T09:30:53.908+0000 7f4899286700  4 rocksdb:
> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
> compaction
> > >>> starting
> > >>> debug 2023-10-09T09:30:53.908+0000 7f4899286700  4 rocksdb:
> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
> compaction
> > >>> starting
> > >>> debug 2023-10-09T09:30:53.908+0000 7f4899286700  4 rocksdb:
> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
> compaction
> > >>> starting
> > >>> debug 2023-10-09T09:30:53.908+0000 7f48a3a9b700  4 rocksdb: (Original
> > Log
> > >>> Time 2023/10/09-09:30:53.911004)
> > [db_impl/db_impl_compaction_flush.cc:2516]
> > >>> [default] Manual compaction from level-5 to level-6 from 'paxos ..
> > 'paxos;
> > >>> will stop at (end)
> > >>> debug 2023-10-09T09:32:08.956+0000 7f48a329a700  4 rocksdb:
> > EVENT_LOG_v1
> > >>> {"time_micros": 1696843928961390, "job": 64228, "event":
> > "flush_started",
> > >>> "num_memtables": 1, "num_entries": 1580, "num_deletes": 502,
> > >>> "total_data_size": 8404605, "memory_usage": 8465840, "flush_reason":
> > >>> "Manual Compaction"}
> > >>> debug 2023-10-09T09:32:08.972+0000 7f4899286700  4 rocksdb:
> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
> compaction
> > >>> starting
> > >>> debug 2023-10-09T09:32:08.976+0000 7f48a3a9b700  4 rocksdb: (Original
> > Log
> > >>> Time 2023/10/09-09:32:08.977739)
> > [db_impl/db_impl_compaction_flush.cc:2516]
> > >>> [default] Manual compaction from level-0 to level-5 from 'logm ..
> > 'logm;
> > >>> will stop at (end)
> > >>> debug 2023-10-09T09:32:08.976+0000 7f4899286700  4 rocksdb:
> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
> compaction
> > >>> starting
> > >>> debug 2023-10-09T09:32:08.976+0000 7f4899286700  4 rocksdb:
> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
> compaction
> > >>> starting
> > >>> debug 2023-10-09T09:32:08.976+0000 7f4899286700  4 rocksdb:
> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
> compaction
> > >>> starting
> > >>> debug 2023-10-09T09:32:08.976+0000 7f4899286700  4 rocksdb:
> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
> compaction
> > >>> starting
> > >>> debug 2023-10-09T09:32:08.976+0000 7f4899286700  4 rocksdb:
> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
> compaction
> > >>> starting
> > >>> debug 2023-10-09T09:32:08.976+0000 7f48a3a9b700  4 rocksdb: (Original
> > Log
> > >>> Time 2023/10/09-09:32:08.978512)
> > [db_impl/db_impl_compaction_flush.cc:2516]
> > >>> [default] Manual compaction from level-5 to level-6 from 'logm ..
> > 'logm;
> > >>> will stop at (end)
> > >>> debug 2023-10-09T09:32:12.764+0000 7f4899286700  4 rocksdb:
> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
> compaction
> > >>> starting
> > >>> debug 2023-10-09T09:32:12.764+0000 7f4899286700  4 rocksdb:
> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
> compaction
> > >>> starting
> > >>> debug 2023-10-09T09:32:12.764+0000 7f4899286700  4 rocksdb:
> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
> compaction
> > >>> starting
> > >>> debug 2023-10-09T09:32:12.764+0000 7f4899286700  4 rocksdb:
> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
> compaction
> > >>> starting
> > >>> debug 2023-10-09T09:32:12.764+0000 7f4899286700  4 rocksdb:
> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
> compaction
> > >>> starting
> > >>> debug 2023-10-09T09:32:12.764+0000 7f4899286700  4 rocksdb:
> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
> compaction
> > >>> starting
> > >>> debug 2023-10-09T09:33:29.028+0000 7f48a329a700  4 rocksdb:
> > EVENT_LOG_v1
> > >>> {"time_micros": 1696844009033151, "job": 64231, "event":
> > "flush_started",
> > >>> "num_memtables": 1, "num_entries": 1430, "num_deletes": 251,
> > >>> "total_data_size": 8975535, "memory_usage": 9035920, "flush_reason":
> > >>> "Manual Compaction"}
> > >>> debug 2023-10-09T09:33:29.044+0000 7f4899286700  4 rocksdb:
> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
> compaction
> > >>> starting
> > >>> debug 2023-10-09T09:33:29.048+0000 7f48a3a9b700  4 rocksdb: (Original
> > Log
> > >>> Time 2023/10/09-09:33:29.049585)
> > [db_impl/db_impl_compaction_flush.cc:2516]
> > >>> [default] Manual compaction from level-0 to level-5 from 'paxos ..
> > 'paxos;
> > >>> will stop at (end)
> > >>> debug 2023-10-09T09:33:29.048+0000 7f4899286700  4 rocksdb:
> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
> compaction
> > >>> starting
> > >>> debug 2023-10-09T09:33:29.048+0000 7f4899286700  4 rocksdb:
> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
> compaction
> > >>> starting
> > >>> debug 2023-10-09T09:33:29.048+0000 7f4899286700  4 rocksdb:
> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
> compaction
> > >>> starting
> > >>> debug 2023-10-09T09:33:29.048+0000 7f4899286700  4 rocksdb:
> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
> compaction
> > >>> starting
> > >>> debug 2023-10-09T09:33:29.048+0000 7f4899286700  4 rocksdb:
> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual
> compaction
> > >>> starting
> > >>> debug 2023-10-09T09:33:29.048+0000 7f48a3a9b700  4 rocksdb: (Original
> > Log
> > >>> Time 2023/10/09-09:33:29.050355)
> > [db_impl/db_impl_compaction_flush.cc:2516]
> > >>> [default] Manual compaction from level-5 to level-6 from 'paxos ..
> > 'paxos;
> > >>> will stop at (end)
> > >>>
> > >>> I have removed a lot of interim log messages to save space.
> > >>>
> > >>> During each compaction the monitor process writes approximately
> 500-600
> > >>> MB of data to disk over a short period of time. These writes add up
> to
> > tens
> > >>> of gigabytes per hour and hundreds of gigabytes per day.
> > >>>
> > >>> Monitor rocksdb and compaction options are default:
> > >>>
> > >>>     "mon_compact_on_bootstrap": "false",
> > >>>     "mon_compact_on_start": "false",
> > >>>     "mon_compact_on_trim": "true",
> > >>>     "mon_rocksdb_options":
> > >>>
> >
> "write_buffer_size=33554432,compression=kNoCompression,level_compaction_dynamic_level_bytes=true",
> > >>>
> > >>> Is this expected behavior? Is this something I can adjust in order to
> > >>> extend the system storage life?
> > >>>
> > >>> Best regards,
> > >>> Zakhar
> > >>>
> > >>
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users@xxxxxxx
> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux