Oh wow! I never bothered looking, because on our hardware the wear is so low: # iotop -ao -bn 2 -d 300 Total DISK READ : 0.00 B/s | Total DISK WRITE : 6.46 M/s Actual DISK READ: 0.00 B/s | Actual DISK WRITE: 6.47 M/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO COMMAND 2230 be/4 ceph 0.00 B 1818.71 M 0.00 % 0.46 % ceph-mon --cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01 --mon-data /var/lib/ceph/mon/ceph-ceph-01 --public-addr 192.168.32.65 [rocksdb:low0] 2256 be/4 ceph 0.00 B 19.27 M 0.00 % 0.43 % ceph-mon --cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01 --mon-data /var/lib/ceph/mon/ceph-ceph-01 --public-addr 192.168.32.65 [safe_timer] 2250 be/4 ceph 0.00 B 42.38 M 0.00 % 0.26 % ceph-mon --cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01 --mon-data /var/lib/ceph/mon/ceph-ceph-01 --public-addr 192.168.32.65 [fn_monstore] 2231 be/4 ceph 0.00 B 58.36 M 0.00 % 0.01 % ceph-mon --cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01 --mon-data /var/lib/ceph/mon/ceph-ceph-01 --public-addr 192.168.32.65 [rocksdb:high0] 644 be/3 root 0.00 B 576.00 K 0.00 % 0.00 % [jbd2/sda3-8] 2225 be/4 ceph 0.00 B 128.00 K 0.00 % 0.00 % ceph-mon --cluster ceph --setuser ceph --setgroup ceph --foreground -i ceph-01 --mon-data /var/lib/ceph/mon/ceph-ceph-01 --public-addr 192.168.32.65 [log] 1637141 be/4 root 0.00 B 0.00 B 0.00 % 0.00 % [kworker/u113:2-flush-8:0] 1636453 be/4 root 0.00 B 0.00 B 0.00 % 0.00 % [kworker/u112:0-ceph0] 1560 be/4 root 0.00 B 20.00 K 0.00 % 0.00 % rsyslogd -n [in:imjournal] 1561 be/4 root 0.00 B 56.00 K 0.00 % 0.00 % rsyslogd -n [rs:main Q:Reg] 1.8GB every 5 minutes, thats 518GB per day. The 400G drives we have are rated 10DWPD and with the 6-drives RAID10 config this gives plenty of life-time. I guess this write load will kill any low-grade SSD (typical bood devices, even enterprise ones) specifically if its smaller drives and the controller doesn't reallocate cells according to remaining write endurance. I guess there was a reason for the recommendations by Dell. I always thought that the recent recommendation for MON store storage in the ceph docs are a "bit unrealistic", apparently both, in size and in performance (including endurance). Well, I guess you need to look for write intensive drives with decent specs. If you do, also go for sufficient size. This will absorb temporary usage peaks that can be very large and also provide extra endurance with SSDs with good controllers. I also think the recommendations on the ceph docs deserve a reality check. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Zakhar Kirpichenko <zakhar@xxxxxxxxx> Sent: Wednesday, October 11, 2023 4:30 PM To: Eugen Block Cc: Frank Schilder; ceph-users@xxxxxxx Subject: Re: Re: Ceph 16.2.x mon compactions, disk writes Eugen, Thanks for your response. May I ask what numbers you're referring to? I am not referring to monitor store.db sizes. I am specifically referring to writes monitors do to their store.db file by frequently rotating and replacing them with new versions during compactions. The size of the store.db remains more or less the same. This is a 300s iotop snippet, sorted by aggregated disk writes: Total DISK READ: 35.56 M/s | Total DISK WRITE: 23.89 M/s Current DISK READ: 35.64 M/s | Current DISK WRITE: 24.09 M/s TID PRIO USER DISK READ DISK WRITE> SWAPIN IO COMMAND 4919 be/4 167 16.75 M 2.24 G 0.00 % 1.34 % ceph-mon -n mon.ceph03 -f --setuser ceph --setgr~lt-mon-cluster-log-to-stderr=true [rocksdb:low0] 15122 be/4 167 0.00 B 652.91 M 0.00 % 0.27 % ceph-osd -n osd.31 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug [bstore_kv_sync] 17073 be/4 167 0.00 B 651.86 M 0.00 % 0.27 % ceph-osd -n osd.32 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug [bstore_kv_sync] 17268 be/4 167 0.00 B 490.86 M 0.00 % 0.18 % ceph-osd -n osd.25 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug [bstore_kv_sync] 18032 be/4 167 0.00 B 463.57 M 0.00 % 0.17 % ceph-osd -n osd.26 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug [bstore_kv_sync] 16855 be/4 167 0.00 B 402.86 M 0.00 % 0.15 % ceph-osd -n osd.22 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug [bstore_kv_sync] 17406 be/4 167 0.00 B 387.03 M 0.00 % 0.14 % ceph-osd -n osd.27 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug [bstore_kv_sync] 17932 be/4 167 0.00 B 375.42 M 0.00 % 0.13 % ceph-osd -n osd.29 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug [bstore_kv_sync] 18017 be/4 167 0.00 B 359.38 M 0.00 % 0.13 % ceph-osd -n osd.28 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug [bstore_kv_sync] 17420 be/4 167 0.00 B 332.83 M 0.00 % 0.12 % ceph-osd -n osd.23 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug [bstore_kv_sync] 17975 be/4 167 0.00 B 312.06 M 0.00 % 0.11 % ceph-osd -n osd.30 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug [bstore_kv_sync] 17273 be/4 167 0.00 B 303.49 M 0.00 % 0.11 % ceph-osd -n osd.24 -f --setuser ceph --setgroup ~default-log-stderr-prefix=debug [bstore_kv_sync] Not a good example, because sometimes mon writes more intensively, but it is very apparent that thread 4919 of the monitor process is the top disk writer in the system. This is the mon thread producing lots of writes: 4919 167 20 0 2031116 1.1g 10652 S 0.0 0.3 288:48.65 rocksdb:low0 Then with a combination of lsof and sysdig I determine that the writes are being made to /var/lib/ceph/mon/ceph-ceph03/store.db/*.sst, i.e. the mon's rocksdb store: ceph-mon 4838 167 200r REG 253,11 67319253 14812899 /var/lib/ceph/mon/ceph-ceph03/store.db/3677146.sst ceph-mon 4838 167 203r REG 253,11 67228736 14813270 /var/lib/ceph/mon/ceph-ceph03/store.db/3677147.sst ceph-mon 4838 167 205r REG 253,11 67243212 14813275 /var/lib/ceph/mon/ceph-ceph03/store.db/3677148.sst ceph-mon 4838 167 208r REG 253,11 67247953 14813316 /var/lib/ceph/mon/ceph-ceph03/store.db/3677149.sst ceph-mon 4838 167 220r REG 253,11 67261659 14813332 /var/lib/ceph/mon/ceph-ceph03/store.db/3677150.sst ceph-mon 4838 167 221r REG 253,11 67242500 14813345 /var/lib/ceph/mon/ceph-ceph03/store.db/3677151.sst ceph-mon 4838 167 224r REG 253,11 67264969 14813348 /var/lib/ceph/mon/ceph-ceph03/store.db/3677152.sst ceph-mon 4838 167 228r REG 253,11 64346933 14813381 /var/lib/ceph/mon/ceph-ceph03/store.db/3677153.sst By matching iotop and sysdig write records to mon's log entries, I see that the writes happen during "manual compaction" events - whatever they are, because there's no documentation on this whatsoever, and each time around 0.56GB is being written to disk to a new set of *.sst files, which is the total size of the store.db. Looks like from time to time the monitor just reads its store.db and writes it out to a new set of files, as the file names "numbers" increase with each write: ceph-mon 4838 167 175r REG 253,11 67220863 14812310 /var/lib/ceph/mon/ceph-ceph03/store.db/3677167.sst ceph-mon 4838 167 200r REG 253,11 67358627 14812899 /var/lib/ceph/mon/ceph-ceph03/store.db/3677168.sst ceph-mon 4838 167 203r REG 253,11 67277978 14813270 /var/lib/ceph/mon/ceph-ceph03/store.db/3677169.sst ceph-mon 4838 167 205r REG 253,11 67256312 14813275 /var/lib/ceph/mon/ceph-ceph03/store.db/3677170.sst ceph-mon 4838 167 208r REG 253,11 67226761 14813316 /var/lib/ceph/mon/ceph-ceph03/store.db/3677171.sst ceph-mon 4838 167 220r REG 253,11 67258798 14813332 /var/lib/ceph/mon/ceph-ceph03/store.db/3677172.sst ceph-mon 4838 167 221r REG 253,11 67224665 14813345 /var/lib/ceph/mon/ceph-ceph03/store.db/3677173.sst ceph-mon 4838 167 224r REG 253,11 67224123 14813348 /var/lib/ceph/mon/ceph-ceph03/store.db/3677174.sst ceph-mon 4838 167 228r REG 253,11 62195349 14813381 /var/lib/ceph/mon/ceph-ceph03/store.db/3677175.sst I hope this clears up the situation. Do you observe this behavior in your clusters? Can you please check whether your mons do something similar and store.db/*.sst change often? /Z On Wed, 11 Oct 2023 at 16:22, Eugen Block <eblock@xxxxxx<mailto:eblock@xxxxxx>> wrote: That all looks normal to me, to be honest. Can you show some details how you calculate the "hundreds of GB per day"? I see similar stats as Frank on different clusters with different client IO. Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx<mailto:zakhar@xxxxxxxxx>>: > Sure, nothing unusual there: > > ------- > > cluster: > id: 3f50555a-ae2a-11eb-a2fc-ffde44714d86 > health: HEALTH_OK > > services: > mon: 5 daemons, quorum ceph01,ceph03,ceph04,ceph05,ceph02 (age 2w) > mgr: ceph01.vankui(active, since 12d), standbys: ceph02.shsinf > osd: 96 osds: 96 up (since 2w), 95 in (since 3w) > > data: > pools: 10 pools, 2400 pgs > objects: 6.23M objects, 16 TiB > usage: 61 TiB used, 716 TiB / 777 TiB avail > pgs: 2396 active+clean > 3 active+clean+scrubbing+deep > 1 active+clean+scrubbing > > io: > client: 2.7 GiB/s rd, 27 MiB/s wr, 46.95k op/s rd, 2.17k op/s wr > > ------- > > Please disregard the big read number, a customer is running a > read-intensive job. Mon store writes keep happening when the cluster is > much more quiet, thus I think that intensive reads have no effect on the > mons. > > Mgr: > > "always_on_modules": [ > "balancer", > "crash", > "devicehealth", > "orchestrator", > "pg_autoscaler", > "progress", > "rbd_support", > "status", > "telemetry", > "volumes" > ], > "enabled_modules": [ > "cephadm", > "dashboard", > "iostat", > "prometheus", > "restful" > ], > > ------- > > /Z > > > On Wed, 11 Oct 2023 at 14:50, Eugen Block <eblock@xxxxxx<mailto:eblock@xxxxxx>> wrote: > >> Can you add some more details as requested by Frank? Which mgr modules >> are enabled? What's the current 'ceph -s' output? >> >> > Is autoscaler running and doing stuff? >> > Is balancer running and doing stuff? >> > Is backfill going on? >> > Is recovery going on? >> > Is your ceph version affected by the "excessive logging to MON >> > store" issue that was present starting with pacific but should have >> > been addressed >> >> >> Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx<mailto:zakhar@xxxxxxxxx>>: >> >> > We don't use CephFS at all and don't have RBD snapshots apart from some >> > cloning for Openstack images. >> > >> > The size of mon stores isn't an issue, it's < 600 MB. But it gets >> > overwritten often causing lots of disk writes, and that is an issue for >> us. >> > >> > /Z >> > >> > On Wed, 11 Oct 2023 at 14:37, Eugen Block <eblock@xxxxxx<mailto:eblock@xxxxxx>> wrote: >> > >> >> Do you use many snapshots (rbd or cephfs)? That can cause a heavy >> >> monitor usage, we've seen large mon stores on customer clusters with >> >> rbd mirroring on snapshot basis. In a healthy cluster they have mon >> >> stores of around 2GB in size. >> >> >> >> >> @Eugen: Was there not an option to limit logging to the MON store? >> >> >> >> I don't recall at the moment, worth checking tough. >> >> >> >> Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx<mailto:zakhar@xxxxxxxxx>>: >> >> >> >> > Thank you, Frank. >> >> > >> >> > The cluster is healthy, operating normally, nothing unusual is going >> on. >> >> We >> >> > observe lots of writes by mon processes into mon rocksdb stores, >> >> > specifically: >> >> > >> >> > /var/lib/ceph/mon/ceph-cephXX/store.db: >> >> > 65M 3675511.sst >> >> > 65M 3675512.sst >> >> > 65M 3675513.sst >> >> > 65M 3675514.sst >> >> > 65M 3675515.sst >> >> > 65M 3675516.sst >> >> > 65M 3675517.sst >> >> > 65M 3675518.sst >> >> > 62M 3675519.sst >> >> > >> >> > The site of the files is not huge, but monitors rotate and write out >> >> these >> >> > files often, sometimes several times per minute, resulting in lots of >> >> data >> >> > written to disk. The writes coincide with "manual compaction" events >> >> logged >> >> > by the monitors, for example: >> >> > >> >> > debug 2023-10-11T11:10:10.483+0000 7f48a3a9b700 4 rocksdb: >> >> > [compaction/compaction_job.cc:1676] [default] [JOB 70854] Compacting >> 1@5 >> >> + >> >> > 9@6 files to L6, score -1.00 >> >> > debug 2023-10-11T11:10:10.483+0000 7f48a3a9b700 4 rocksdb: >> EVENT_LOG_v1 >> >> > {"time_micros": 1697022610487624, "job": 70854, "event": >> >> > "compaction_started", "compaction_reason": "ManualCompaction", >> >> "files_L5": >> >> > [3675543], "files_L6": [3675533, 3675534, 3675535, 3675536, 3675537, >> >> > 3675538, 3675539, 3675540, 3675541], "score": -1, "input_data_size": >> >> > 601117031} >> >> > debug 2023-10-11T11:10:10.619+0000 7f48a3a9b700 4 rocksdb: >> >> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated >> table >> >> > #3675544: 2015 keys, 67287115 bytes >> >> > debug 2023-10-11T11:10:10.763+0000 7f48a3a9b700 4 rocksdb: >> >> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated >> table >> >> > #3675545: 24343 keys, 67336225 bytes >> >> > debug 2023-10-11T11:10:10.899+0000 7f48a3a9b700 4 rocksdb: >> >> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated >> table >> >> > #3675546: 1196 keys, 67225813 bytes >> >> > debug 2023-10-11T11:10:11.035+0000 7f48a3a9b700 4 rocksdb: >> >> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated >> table >> >> > #3675547: 1049 keys, 67252678 bytes >> >> > debug 2023-10-11T11:10:11.167+0000 7f48a3a9b700 4 rocksdb: >> >> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated >> table >> >> > #3675548: 1081 keys, 67216638 bytes >> >> > debug 2023-10-11T11:10:11.303+0000 7f48a3a9b700 4 rocksdb: >> >> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated >> table >> >> > #3675549: 1196 keys, 67245376 bytes >> >> > debug 2023-10-11T11:10:12.023+0000 7f48a3a9b700 4 rocksdb: >> >> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated >> table >> >> > #3675550: 1195 keys, 67246813 bytes >> >> > debug 2023-10-11T11:10:13.059+0000 7f48a3a9b700 4 rocksdb: >> >> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated >> table >> >> > #3675551: 1205 keys, 67223302 bytes >> >> > debug 2023-10-11T11:10:13.903+0000 7f48a3a9b700 4 rocksdb: >> >> > [compaction/compaction_job.cc:1349] [default] [JOB 70854] Generated >> table >> >> > #3675552: 1312 keys, 56416011 bytes >> >> > debug 2023-10-11T11:10:13.911+0000 7f48a3a9b700 4 rocksdb: >> >> > [compaction/compaction_job.cc:1415] [default] [JOB 70854] Compacted >> 1@5 >> >> + >> >> > 9@6 files to L6 => 594449971 bytes >> >> > debug 2023-10-11T11:10:13.915+0000 7f48a3a9b700 4 rocksdb: (Original >> Log >> >> > Time 2023/10/11-11:10:13.920991) [compaction/compaction_job.cc:760] >> >> > [default] compacted to: base level 5 level multiplier 10.00 max bytes >> >> base >> >> > 268435456 files[0 0 0 0 0 0 9] max score 0.00, MB/sec: 175.8 rd, 173.9 >> >> wr, >> >> > level 6, files in(1, 9) out(9) MB in(0.3, 572.9) out(566.9), >> >> > read-write-amplify(3434.6) write-amplify(1707.7) OK, records in: >> 35108, >> >> > records dropped: 516 output_compression: NoCompression >> >> > debug 2023-10-11T11:10:13.915+0000 7f48a3a9b700 4 rocksdb: (Original >> Log >> >> > Time 2023/10/11-11:10:13.921010) EVENT_LOG_v1 {"time_micros": >> >> > 1697022613921002, "job": 70854, "event": "compaction_finished", >> >> > "compaction_time_micros": 3418822, "compaction_time_cpu_micros": >> 785454, >> >> > "output_level": 6, "num_output_files": 9, "total_output_size": >> 594449971, >> >> > "num_input_records": 35108, "num_output_records": 34592, >> >> > "num_subcompactions": 1, "output_compression": "NoCompression", >> >> > "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0, >> >> > "lsm_state": [0, 0, 0, 0, 0, 0, 9]} >> >> > >> >> > The log even mentions the huge write multiplication. I wonder whether >> >> this >> >> > is normal and what can be done about it. >> >> > >> >> > /Z >> >> > >> >> > On Wed, 11 Oct 2023 at 13:55, Frank Schilder <frans@xxxxxx<mailto:frans@xxxxxx>> wrote: >> >> > >> >> >> I need to ask here: where exactly do you observe the hundreds of GB >> >> >> written per day? Are the mon logs huge? Is it the mon store? Is your >> >> >> cluster unhealthy? >> >> >> >> >> >> We have an octopus cluster with 1282 OSDs, 1650 ceph fs clients and >> >> about >> >> >> 800 librbd clients. Per week our mon logs are about 70M, the cluster >> >> logs >> >> >> about 120M , the audit logs about 70M and I see between 100-200Kb/s >> >> writes >> >> >> to the mon store. That's in the lower-digit GB range per day. >> Hundreds >> >> of >> >> >> GB per day sound completely over the top on a healthy cluster, unless >> >> you >> >> >> have MGR modules changing the OSD/cluster map continuously. >> >> >> >> >> >> Is autoscaler running and doing stuff? >> >> >> Is balancer running and doing stuff? >> >> >> Is backfill going on? >> >> >> Is recovery going on? >> >> >> Is your ceph version affected by the "excessive logging to MON store" >> >> >> issue that was present starting with pacific but should have been >> >> addressed >> >> >> by now? >> >> >> >> >> >> @Eugen: Was there not an option to limit logging to the MON store? >> >> >> >> >> >> For information to readers, we followed old recommendations from a >> Dell >> >> >> white paper for building a ceph cluster and have a 1TB Raid10 array >> on >> >> 6x >> >> >> write intensive SSDs for the MON stores. After 5 years we are below >> 10% >> >> >> wear. Average size of the MON store for a healthy cluster is 500M-1G, >> >> but >> >> >> we have seen this ballooning to 100+GB in degraded conditions. >> >> >> >> >> >> Best regards, >> >> >> ================= >> >> >> Frank Schilder >> >> >> AIT Risø Campus >> >> >> Bygning 109, rum S14 >> >> >> >> >> >> ________________________________________ >> >> >> From: Zakhar Kirpichenko <zakhar@xxxxxxxxx<mailto:zakhar@xxxxxxxxx>> >> >> >> Sent: Wednesday, October 11, 2023 12:00 PM >> >> >> To: Eugen Block >> >> >> Cc: ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> >> >> >> Subject: Re: Ceph 16.2.x mon compactions, disk writes >> >> >> >> >> >> Thank you, Eugen. >> >> >> >> >> >> I'm interested specifically to find out whether the huge amount of >> data >> >> >> written by monitors is expected. It is eating through the endurance >> of >> >> our >> >> >> system drives, which were not specced for high DWPD/TBW, as this is >> not >> >> a >> >> >> documented requirement, and monitors produce hundreds of gigabytes of >> >> >> writes per day. I am looking for ways to reduce the amount of >> writes, if >> >> >> possible. >> >> >> >> >> >> /Z >> >> >> >> >> >> On Wed, 11 Oct 2023 at 12:41, Eugen Block <eblock@xxxxxx<mailto:eblock@xxxxxx>> wrote: >> >> >> >> >> >> > Hi, >> >> >> > >> >> >> > what you report is the expected behaviour, at least I see the same >> on >> >> >> > all clusters. I can't answer why the compaction is required that >> >> >> > often, but you can control the log level of the rocksdb output: >> >> >> > >> >> >> > ceph config set mon debug_rocksdb 1/5 (default is 4/5) >> >> >> > >> >> >> > This reduces the log entries and you wouldn't see the manual >> >> >> > compaction logs anymore. There are a couple more rocksdb options >> but I >> >> >> > probably wouldn't change too much, only if you know what you're >> doing. >> >> >> > Maybe Igor can comment if some other tuning makes sense here. >> >> >> > >> >> >> > Regards, >> >> >> > Eugen >> >> >> > >> >> >> > Zitat von Zakhar Kirpichenko <zakhar@xxxxxxxxx<mailto:zakhar@xxxxxxxxx>>: >> >> >> > >> >> >> > > Any input from anyone, please? >> >> >> > > >> >> >> > > On Tue, 10 Oct 2023 at 09:44, Zakhar Kirpichenko < >> zakhar@xxxxxxxxx<mailto:zakhar@xxxxxxxxx>> >> >> >> > wrote: >> >> >> > > >> >> >> > >> Any input from anyone, please? >> >> >> > >> >> >> >> > >> It's another thing that seems to be rather poorly documented: >> it's >> >> >> > unclear >> >> >> > >> what to expect, what 'normal' behavior should be, and what can >> be >> >> done >> >> >> > >> about the huge amount of writes by monitors. >> >> >> > >> >> >> >> > >> /Z >> >> >> > >> >> >> >> > >> On Mon, 9 Oct 2023 at 12:40, Zakhar Kirpichenko < >> zakhar@xxxxxxxxx<mailto:zakhar@xxxxxxxxx>> >> >> >> > wrote: >> >> >> > >> >> >> >> > >>> Hi, >> >> >> > >>> >> >> >> > >>> Monitors in our 16.2.14 cluster appear to quite often run >> "manual >> >> >> > >>> compaction" tasks: >> >> >> > >>> >> >> >> > >>> debug 2023-10-09T09:30:53.888+0000 7f48a329a700 4 rocksdb: >> >> >> > EVENT_LOG_v1 >> >> >> > >>> {"time_micros": 1696843853892760, "job": 64225, "event": >> >> >> > "flush_started", >> >> >> > >>> "num_memtables": 1, "num_entries": 715, "num_deletes": 251, >> >> >> > >>> "total_data_size": 3870352, "memory_usage": 3886744, >> >> "flush_reason": >> >> >> > >>> "Manual Compaction"} >> >> >> > >>> debug 2023-10-09T09:30:53.904+0000 7f4899286700 4 rocksdb: >> >> >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual >> >> >> compaction >> >> >> > >>> starting >> >> >> > >>> debug 2023-10-09T09:30:53.908+0000 7f48a3a9b700 4 rocksdb: >> >> (Original >> >> >> > Log >> >> >> > >>> Time 2023/10/09-09:30:53.910204) >> >> >> > [db_impl/db_impl_compaction_flush.cc:2516] >> >> >> > >>> [default] Manual compaction from level-0 to level-5 from >> 'paxos .. >> >> >> > 'paxos; >> >> >> > >>> will stop at (end) >> >> >> > >>> debug 2023-10-09T09:30:53.908+0000 7f4899286700 4 rocksdb: >> >> >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual >> >> >> compaction >> >> >> > >>> starting >> >> >> > >>> debug 2023-10-09T09:30:53.908+0000 7f4899286700 4 rocksdb: >> >> >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual >> >> >> compaction >> >> >> > >>> starting >> >> >> > >>> debug 2023-10-09T09:30:53.908+0000 7f4899286700 4 rocksdb: >> >> >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual >> >> >> compaction >> >> >> > >>> starting >> >> >> > >>> debug 2023-10-09T09:30:53.908+0000 7f4899286700 4 rocksdb: >> >> >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual >> >> >> compaction >> >> >> > >>> starting >> >> >> > >>> debug 2023-10-09T09:30:53.908+0000 7f4899286700 4 rocksdb: >> >> >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual >> >> >> compaction >> >> >> > >>> starting >> >> >> > >>> debug 2023-10-09T09:30:53.908+0000 7f48a3a9b700 4 rocksdb: >> >> (Original >> >> >> > Log >> >> >> > >>> Time 2023/10/09-09:30:53.911004) >> >> >> > [db_impl/db_impl_compaction_flush.cc:2516] >> >> >> > >>> [default] Manual compaction from level-5 to level-6 from >> 'paxos .. >> >> >> > 'paxos; >> >> >> > >>> will stop at (end) >> >> >> > >>> debug 2023-10-09T09:32:08.956+0000 7f48a329a700 4 rocksdb: >> >> >> > EVENT_LOG_v1 >> >> >> > >>> {"time_micros": 1696843928961390, "job": 64228, "event": >> >> >> > "flush_started", >> >> >> > >>> "num_memtables": 1, "num_entries": 1580, "num_deletes": 502, >> >> >> > >>> "total_data_size": 8404605, "memory_usage": 8465840, >> >> "flush_reason": >> >> >> > >>> "Manual Compaction"} >> >> >> > >>> debug 2023-10-09T09:32:08.972+0000 7f4899286700 4 rocksdb: >> >> >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual >> >> >> compaction >> >> >> > >>> starting >> >> >> > >>> debug 2023-10-09T09:32:08.976+0000 7f48a3a9b700 4 rocksdb: >> >> (Original >> >> >> > Log >> >> >> > >>> Time 2023/10/09-09:32:08.977739) >> >> >> > [db_impl/db_impl_compaction_flush.cc:2516] >> >> >> > >>> [default] Manual compaction from level-0 to level-5 from 'logm >> .. >> >> >> > 'logm; >> >> >> > >>> will stop at (end) >> >> >> > >>> debug 2023-10-09T09:32:08.976+0000 7f4899286700 4 rocksdb: >> >> >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual >> >> >> compaction >> >> >> > >>> starting >> >> >> > >>> debug 2023-10-09T09:32:08.976+0000 7f4899286700 4 rocksdb: >> >> >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual >> >> >> compaction >> >> >> > >>> starting >> >> >> > >>> debug 2023-10-09T09:32:08.976+0000 7f4899286700 4 rocksdb: >> >> >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual >> >> >> compaction >> >> >> > >>> starting >> >> >> > >>> debug 2023-10-09T09:32:08.976+0000 7f4899286700 4 rocksdb: >> >> >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual >> >> >> compaction >> >> >> > >>> starting >> >> >> > >>> debug 2023-10-09T09:32:08.976+0000 7f4899286700 4 rocksdb: >> >> >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual >> >> >> compaction >> >> >> > >>> starting >> >> >> > >>> debug 2023-10-09T09:32:08.976+0000 7f48a3a9b700 4 rocksdb: >> >> (Original >> >> >> > Log >> >> >> > >>> Time 2023/10/09-09:32:08.978512) >> >> >> > [db_impl/db_impl_compaction_flush.cc:2516] >> >> >> > >>> [default] Manual compaction from level-5 to level-6 from 'logm >> .. >> >> >> > 'logm; >> >> >> > >>> will stop at (end) >> >> >> > >>> debug 2023-10-09T09:32:12.764+0000 7f4899286700 4 rocksdb: >> >> >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual >> >> >> compaction >> >> >> > >>> starting >> >> >> > >>> debug 2023-10-09T09:32:12.764+0000 7f4899286700 4 rocksdb: >> >> >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual >> >> >> compaction >> >> >> > >>> starting >> >> >> > >>> debug 2023-10-09T09:32:12.764+0000 7f4899286700 4 rocksdb: >> >> >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual >> >> >> compaction >> >> >> > >>> starting >> >> >> > >>> debug 2023-10-09T09:32:12.764+0000 7f4899286700 4 rocksdb: >> >> >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual >> >> >> compaction >> >> >> > >>> starting >> >> >> > >>> debug 2023-10-09T09:32:12.764+0000 7f4899286700 4 rocksdb: >> >> >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual >> >> >> compaction >> >> >> > >>> starting >> >> >> > >>> debug 2023-10-09T09:32:12.764+0000 7f4899286700 4 rocksdb: >> >> >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual >> >> >> compaction >> >> >> > >>> starting >> >> >> > >>> debug 2023-10-09T09:33:29.028+0000 7f48a329a700 4 rocksdb: >> >> >> > EVENT_LOG_v1 >> >> >> > >>> {"time_micros": 1696844009033151, "job": 64231, "event": >> >> >> > "flush_started", >> >> >> > >>> "num_memtables": 1, "num_entries": 1430, "num_deletes": 251, >> >> >> > >>> "total_data_size": 8975535, "memory_usage": 9035920, >> >> "flush_reason": >> >> >> > >>> "Manual Compaction"} >> >> >> > >>> debug 2023-10-09T09:33:29.044+0000 7f4899286700 4 rocksdb: >> >> >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual >> >> >> compaction >> >> >> > >>> starting >> >> >> > >>> debug 2023-10-09T09:33:29.048+0000 7f48a3a9b700 4 rocksdb: >> >> (Original >> >> >> > Log >> >> >> > >>> Time 2023/10/09-09:33:29.049585) >> >> >> > [db_impl/db_impl_compaction_flush.cc:2516] >> >> >> > >>> [default] Manual compaction from level-0 to level-5 from >> 'paxos .. >> >> >> > 'paxos; >> >> >> > >>> will stop at (end) >> >> >> > >>> debug 2023-10-09T09:33:29.048+0000 7f4899286700 4 rocksdb: >> >> >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual >> >> >> compaction >> >> >> > >>> starting >> >> >> > >>> debug 2023-10-09T09:33:29.048+0000 7f4899286700 4 rocksdb: >> >> >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual >> >> >> compaction >> >> >> > >>> starting >> >> >> > >>> debug 2023-10-09T09:33:29.048+0000 7f4899286700 4 rocksdb: >> >> >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual >> >> >> compaction >> >> >> > >>> starting >> >> >> > >>> debug 2023-10-09T09:33:29.048+0000 7f4899286700 4 rocksdb: >> >> >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual >> >> >> compaction >> >> >> > >>> starting >> >> >> > >>> debug 2023-10-09T09:33:29.048+0000 7f4899286700 4 rocksdb: >> >> >> > >>> [db_impl/db_impl_compaction_flush.cc:1443] [default] Manual >> >> >> compaction >> >> >> > >>> starting >> >> >> > >>> debug 2023-10-09T09:33:29.048+0000 7f48a3a9b700 4 rocksdb: >> >> (Original >> >> >> > Log >> >> >> > >>> Time 2023/10/09-09:33:29.050355) >> >> >> > [db_impl/db_impl_compaction_flush.cc:2516] >> >> >> > >>> [default] Manual compaction from level-5 to level-6 from >> 'paxos .. >> >> >> > 'paxos; >> >> >> > >>> will stop at (end) >> >> >> > >>> >> >> >> > >>> I have removed a lot of interim log messages to save space. >> >> >> > >>> >> >> >> > >>> During each compaction the monitor process writes approximately >> >> >> 500-600 >> >> >> > >>> MB of data to disk over a short period of time. These writes >> add >> >> up >> >> >> to >> >> >> > tens >> >> >> > >>> of gigabytes per hour and hundreds of gigabytes per day. >> >> >> > >>> >> >> >> > >>> Monitor rocksdb and compaction options are default: >> >> >> > >>> >> >> >> > >>> "mon_compact_on_bootstrap": "false", >> >> >> > >>> "mon_compact_on_start": "false", >> >> >> > >>> "mon_compact_on_trim": "true", >> >> >> > >>> "mon_rocksdb_options": >> >> >> > >>> >> >> >> > >> >> >> >> >> >> "write_buffer_size=33554432,compression=kNoCompression,level_compaction_dynamic_level_bytes=true", >> >> >> > >>> >> >> >> > >>> Is this expected behavior? Is this something I can adjust in >> >> order to >> >> >> > >>> extend the system storage life? >> >> >> > >>> >> >> >> > >>> Best regards, >> >> >> > >>> Zakhar >> >> >> > >>> >> >> >> > >> >> >> >> > > _______________________________________________ >> >> >> > > ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> >> >> >> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx> >> >> >> > >> >> >> > >> >> >> > _______________________________________________ >> >> >> > ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> >> >> >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx> >> >> >> > >> >> >> _______________________________________________ >> >> >> ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> >> >> >> To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx