Hi Igor, yes i have some osd settings set :-) here are my ceph config dump. those settings are from a redhat document for bluestore devices maybe it is that setting causing this problem? "advanced mon_compact_on_trim false"??? i will test it this afternoon... at the moment are everything semi prodcuctive and i need to repair one osd node.. because i think of this reason the osds crashed on the node and the osd container crashes with a dump while coming up now. need first to replicate all between all three nodes and then i can take offline the osd.2.and test your command. i will inform you later... root@cd88-ceph-osdh-01:/# ceph config dump WHO MASK LEVEL OPTION VALUE RO global advanced leveldb_max_open_files 131072 global advanced mon_compact_on_trim false global dev ms_crc_data false global advanced osd_deep_scrub_interval 1209600.000000 global advanced osd_max_scrubs 16 global advanced osd_scrub_load_threshold 0.010000 global advanced osd_scrub_max_interval 1209600.000000 global advanced osd_scrub_min_interval 86400.000000 global advanced perf true global advanced rbd_readahead_disable_after_bytes 0 global advanced rbd_readahead_max_bytes 4194304 global advanced rocksdb_perf true global advanced throttler_perf_counter false mon advanced auth_allow_insecure_global_id_reclaim false mon advanced cluster_network 10.50.50.0/24 * mon advanced mon_osd_down_out_interval 300 mon advanced public_network 10.50.50.0/24 * mgr advanced mgr/cephadm/container_init True * mgr advanced mgr/cephadm/device_enhanced_scan true * mgr advanced mgr/cephadm/migration_current 2 * mgr advanced mgr/cephadm/warn_on_stray_daemons false * mgr advanced mgr/cephadm/warn_on_stray_hosts false * osd advanced bluefs_sync_write true osd dev bluestore_cache_autotune true osd dev bluestore_cache_kv_ratio 0.200000 osd dev bluestore_cache_meta_ratio 0.800000 osd dev bluestore_cache_size 2147483648 osd dev bluestore_cache_size_hdd 2147483648 osd advanced bluestore_csum_type none osd dev bluestore_extent_map_shard_max_size 200 osd dev bluestore_extent_map_shard_min_size 50 osd dev bluestore_extent_map_shard_target_size 100 osd advanced bluestore_rocksdb_options compression=kNoCompression,max_write_buffer_number=64,min_write_buffer_number_to_merge=32,recycle_log_file_num=64,compaction_style=kCompactionStyleLevel,write_buffer_size=4MB,target_file_size_base=4MB,max_background_compactions=64,level0_file_num_compaction_trigger=64,level0_slowdown_writes_trigger=128,level0_stop_writes_trigger=256,max_bytes_for_level_base=6GB,compaction_threads=32,flusher_threads=8,compaction_readahead_size=2MB * osd advanced mon_osd_cache_size 1024 osd dev ms_crc_data false osd advanced osd_map_share_max_epochs 5 osd advanced osd_max_backfills 1 osd dev osd_max_pg_log_entries 10 osd dev osd_memory_cache_min 3000000000 osd host:cd133-ceph-osdh-01 basic osd_memory_target 5797322383 osd host:cd133k-ceph-osdh-01 basic osd_memory_target 9402402385 osd host:cd88-ceph-osdh-01 basic osd_memory_target 5797322096 osd advanced osd_memory_target_autotune true osd dev osd_min_pg_log_entries 10 osd advanced osd_op_num_shards 8 * osd advanced osd_op_num_threads_per_shard 2 * osd dev osd_pg_log_dups_tracked 10 osd dev osd_pg_log_trim_min 10 osd advanced osd_recovery_max_active 3 osd advanced osd_recovery_max_single_start 1 osd advanced osd_recovery_sleep 0.000000 Am Mi., 6. Okt. 2021 um 12:55 Uhr schrieb Igor Fedotov <ifedotov@xxxxxxx>: > Jose, > > In fact 48GB is a way too much for WAL drive - usually the write ahead log > tend to be 2-4 GBs. > > But in your case it's ~150GB, while DB itself is very small (146MB!!!): > > WAL 45 GiB 111 GiB 0 B 0 B 0 B > 154 GiB 2400 > > DB 0 B 164 MiB 0 B 0 B 0 B > 146 MiB 30 > > > which means that there are some issues with RocksDB's WAL processing, > which needs some troubleshooting... > > Curious if other OSDs are suffering from the same and whether you have any > custom settings for your OSD(s)? > > Additionally you might want to try the following command to compact this > specific OSD manually and check if this would normalize the DB layout - the > majority of data has to be at DB level not WAL. Please share the resulting > layout (reported by "ceph daemon osd.2 bluefs stats" command) after the > compaction is fulfiled and OSD is restarted. > > The compaction command to be applied on an offline OSD: "ceph-kvstore-tool > bluestore-kv <path-to-osd> compact" > > Even if the above works great please refrain from applying that compaction > to every OSD - let's see how that "compacted" OSD evolves.Would WAL grow > again or not? > > Thanks, > > Igor > > > > > > > On 10/6/2021 1:35 PM, José H. Freidhof wrote: > > Hello Igor, > > yes the volume is nvme wal partitions for the bluestore devicegroups are > only 48gb each > > on each osd node are 1 nvme with 1tb splitted in 20 lvs with 48gb (WAL) > on each osd node are 4 ssd with 1tb splitted in 5 lvs with 175gb (rock.db) > on each osd node are 20 hdd with 5.5tb with 1 lvs (block.db) > > each blustore have 1 partition nvme,ssd and hdd like described in the > documentation > https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-ref/ > > is this to small or can i adjust the max allocation on the wal nvme device > in the ceph configuration? > i know that the ssd and nvme are to small for those 5.5tb disk... its 1% > only ot the rotation disk. > i am new in ceph and still or always learning, but we are in a little > hurry because our other datastores are old and full. > > root@cd88-ceph-osdh-01:/# ceph daemon osd.2 bluestore bluefs device info > { > "dev": { > "device": "BDEV_WAL", > "total": 48318377984, > "free": 1044480, > "bluefs_used": 48317333504 > }, > "dev": { > "device": "BDEV_DB", > "total": 187904811008, > "free": 68757217280, > "bluefs_used": 119147593728 > }, > "dev": { > "device": "BDEV_SLOW", > "total": 6001172414464, > "free": 5624912359424, > "bluefs_used": 0, > "bluefs max available": 5624401231872 > } > } > root@cd88-ceph-osdh-01:/# ceph daemon osd.2 bluefs stats > 0 : device size 0xb3ffff000 : using 0xb3ff00000(45 GiB) > 1 : device size 0x2bbfffe000 : using 0x1bbeb00000(111 GiB) > 2 : device size 0x57541c00000 : using 0x579b592000(350 GiB) > RocksDBBlueFSVolumeSelector: wal_total:45902462976, db_total:178509578240, > slow_total:5701113793740, db_avail:103884521472 > Usage matrix: > DEV/LEV WAL DB SLOW * * > REAL FILES > LOG 124 MiB 2.3 GiB 0 B 0 B 0 B > 7.5 MiB 1 > WAL 45 GiB 111 GiB 0 B 0 B 0 B > 154 GiB 2400 > DB 0 B 164 MiB 0 B 0 B 0 B > 146 MiB 30 > SLOW 0 B 0 B 0 B 0 B 0 B 0 > B 0 > TOTALS 45 GiB 113 GiB 0 B 0 B 0 B 0 > B 2431 > MAXIMUMS: > LOG 124 MiB 2.3 GiB 0 B 0 B 0 B 17 > MiB > WAL 45 GiB 149 GiB 0 B 0 B 0 B > 192 GiB > DB 0 B 762 MiB 0 B 0 B 0 B > 741 MiB > SLOW 0 B 0 B 0 B 0 B 0 B 0 B > TOTALS 45 GiB 150 GiB 0 B 0 B 0 B 0 B > > Am Mi., 6. Okt. 2021 um 11:45 Uhr schrieb Igor Fedotov <ifedotov@xxxxxxx>: > >> Hey Jose, >> >> it looks like your WAL volume is out of space which looks weird given >> its capacity = 48Gb. >> >> Could you please share the output of the following commands: >> >> ceph daemon osd.N bluestore bluefs device info >> >> ceph daemon osd.N bluefs stats >> >> >> Thanks, >> >> Igor >> >> >> On 10/6/2021 12:24 PM, José H. Freidhof wrote: >> > Hello together >> > >> > we have a running ceph pacific 16.2.5 cluster and i found this messages >> in >> > the service logs of the osd daemons. >> > >> > we have three osd nodes .. each node has 20osds as bluestore with >> > nvme/ssd/hdd >> > >> > is this a bug or maybe i have some settings wrong? >> > >> > >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:25.821+0000 >> > 7f38eebd4700 1 bluefs _allocate unable to allocate 0x100000 on bdev 0, >> > allocator name bluefs-wal, allocator type hybrid, capacity 0xb40000000, >> > block size 0x100000, free 0xff000, fragmentation 0, allocated 0x0 >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:29.857+0000 >> > 7f38eebd4700 1 bluefs _allocate unable to allocate 0x100000 on bdev 0, >> > allocator name bluefs-wal, allocator type hybrid, capacity 0xb40000000, >> > block size 0x100000, free 0xff000, fragmentation 0, allocated 0x0 >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.073+0000 >> > 7f38eebd4700 1 bluefs _allocate unable to allocate 0x400000 on bdev 0, >> > allocator name bluefs-wal, allocator type hybrid, capacity 0xb40000000, >> > block size 0x100000, free 0xff000, fragmentation 0, allocated 0x0 >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.405+0000 >> > 7f38eebd4700 1 bluefs _allocate unable to allocate 0x100000 on bdev 0, >> > allocator name bluefs-wal, allocator type hybrid, capacity 0xb40000000, >> > block size 0x100000, free 0xff000, fragmentation 0, allocated 0x0 >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.465+0000 >> > 7f38eebd4700 1 bluefs _allocate unable to allocate 0x100000 on bdev 0, >> > allocator name bluefs-wal, allocator type hybrid, capacity 0xb40000000, >> > block size 0x100000, free 0xff000, fragmentation 0, allocated 0x0 >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.529+0000 >> > 7f38eebd4700 1 bluefs _allocate unable to allocate 0x100000 on bdev 0, >> > allocator name bluefs-wal, allocator type hybrid, capacity 0xb40000000, >> > block size 0x100000, free 0xff000, fragmentation 0, allocated 0x0 >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.545+0000 >> > 7f38eebd4700 4 rocksdb: [db_impl/db_impl_write.cc:1668] [L] New >> memtable >> > created with log file: #9588. Immutable memtables: 1. >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.545+0000 >> > 7f38eebd4700 1 bluefs _allocate unable to allocate 0x100000 on bdev 0, >> > allocator name bluefs-wal, allocator type hybrid, capacity 0xb40000000, >> > block size 0x100000, free 0xff000, fragmentation 0, allocated 0x0 >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.545+0000 >> > 7f3905c02700 4 rocksdb: (Original Log Time 2021/10/06-09:17:30.547575) >> > [db_impl/db_impl_compaction_flush.cc:2198] Calling >> > FlushMemTableToOutputFile with column family [L], flush slots available >> 1, >> > compaction slots available 1, flush slots scheduled 1, compaction slots >> > scheduled 0 >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.545+0000 >> > 7f3905c02700 4 rocksdb: [flush_job.cc:321] [L] [JOB 5709] Flushing >> > memtable with next log file: 9587 >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.545+0000 >> > 7f3905c02700 4 rocksdb: [flush_job.cc:321] [L] [JOB 5709] Flushing >> > memtable with next log file: 9588 >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.545+0000 >> > 7f3905c02700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1633511850547916, >> > "job": 5709, "event": "flush_started", "num_memtables": 2, >> "num_entries": >> > 4146, "num_deletes": 0, "total_data_size": 127203926, "memory_usage": >> > 130479920, "flush_reason": "Write Buffer Full"} >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.545+0000 >> > 7f3905c02700 4 rocksdb: [flush_job.cc:350] [L] [JOB 5709] Level-0 flush >> > table #9589: started >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.557+0000 >> > 7f3905c02700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1633511850559292, >> > "cf_name": "L", "job": 5709, "event": "table_file_creation", >> "file_number": >> > 9589, "file_size": 3249934, "table_properties": {"data_size": 3247855, >> > "index_size": 1031, "index_partitions": 0, "top_level_index_size": 0, >> > "index_key_is_user_key": 0, "index_value_is_delta_encoded": 0, >> > "filter_size": 197, "raw_key_size": 1088, "raw_average_key_size": 16, >> > "raw_value_size": 3246252, "raw_average_value_size": 47739, >> > "num_data_blocks": 36, "num_entries": 68, "num_deletions": 32, >> > "num_merge_operands": 0, "num_range_deletions": 0, "format_version": 0, >> > "fixed_key_len": 0, "filter_policy": "rocksdb.BuiltinBloomFilter", >> > "column_family_name": "L", "column_family_id": 10, "comparator": >> > "leveldb.BytewiseComparator", "merge_operator": "nullptr", >> > "prefix_extractor_name": "nullptr", "property_collectors": "[]", >> > "compression": "NoCompression", "compression_options": "window_bits=-14; >> > level=32767; strategy=0; max_dict_bytes=0; zstd_max_train_bytes=0; >> > enabled=0; ", "creation_time": 1633511730, "oldest_key_time": >> 1633511730, >> > "file_creation_time": 1633511850}} >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.557+0000 >> > 7f3905c02700 4 rocksdb: [flush_job.cc:401] [L] [JOB 5709] Level-0 flush >> > table #9589: 3249934 bytes OK >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.557+0000 >> > 7f3905c02700 4 rocksdb: (Original Log Time 2021/10/06-09:17:30.559362) >> > [memtable_list.cc:447] [L] Level-0 commit table #9589 started >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.557+0000 >> > 7f3905c02700 4 rocksdb: (Original Log Time 2021/10/06-09:17:30.559583) >> > [memtable_list.cc:503] [L] Level-0 commit table #9589: memtable #1 done >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.557+0000 >> > 7f3905c02700 4 rocksdb: (Original Log Time 2021/10/06-09:17:30.559586) >> > [memtable_list.cc:503] [L] Level-0 commit table #9589: memtable #2 done >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.557+0000 >> > 7f3905c02700 4 rocksdb: (Original Log Time 2021/10/06-09:17:30.559601) >> > EVENT_LOG_v1 {"time_micros": 1633511850559593, "job": 5709, "event": >> > "flush_finished", "output_compression": "NoCompression", "lsm_state": >> [8, >> > 1, 0, 0, 0, 0, 0], "immutable_memtables": 0} >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.557+0000 >> > 7f3905c02700 4 rocksdb: (Original Log Time 2021/10/06-09:17:30.559638) >> > [db_impl/db_impl_compaction_flush.cc:205] [L] Level summary: files[8 1 >> 0 0 >> > 0 0 0] max score 1.00 >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.557+0000 >> > 7f38fb3ed700 4 rocksdb: [compaction/compaction_job.cc:1676] [L] [JOB >> 5710] >> > Compacting 8@0 + 1@1 files to L1, score 1.00 >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.557+0000 >> > 7f38fb3ed700 4 rocksdb: [compaction/compaction_job.cc:1680] [L] >> Compaction >> > start summary: Base version 3090 Base level 0, inputs: [9589(3173KB) >> > 9586(4793KB) 9583(1876KB) 9580(194KB) 9576(6417KB) 9573(1078KB) >> 9570(405KB) >> > 9567(29KB)], [9564(1115KB)] >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.557+0000 >> > 7f38fb3ed700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1633511850559956, >> > "job": 5710, "event": "compaction_started", "compaction_reason": >> > "LevelL0FilesNum", "files_L0": [9589, 9586, 9583, 9580, 9576, 9573, >> 9570, >> > 9567], "files_L1": [9564], "score": 1, "input_data_size": 19542092} >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.581+0000 >> > 7f38fb3ed700 4 rocksdb: [compaction/compaction_job.cc:1349] [L] [JOB >> 5710] >> > Generated table #9590: 36 keys, 3249524 bytes >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.581+0000 >> > 7f38fb3ed700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1633511850582987, >> > "cf_name": "L", "job": 5710, "event": "table_file_creation", >> "file_number": >> > 9590, "file_size": 3249524, "table_properties": {"data_size": 3247449, >> > "index_size": 1031, "index_partitions": 0, "top_level_index_size": 0, >> > "index_key_is_user_key": 0, "index_value_is_delta_encoded": 0, >> > "filter_size": 197, "raw_key_size": 576, "raw_average_key_size": 16, >> > "raw_value_size": 3246252, "raw_average_value_size": 90173, >> > "num_data_blocks": 36, "num_entries": 36, "num_deletions": 0, >> > "num_merge_operands": 0, "num_range_deletions": 0, "format_version": 0, >> > "fixed_key_len": 0, "filter_policy": "rocksdb.BuiltinBloomFilter", >> > "column_family_name": "L", "column_family_id": 10, "comparator": >> > "leveldb.BytewiseComparator", "merge_operator": "nullptr", >> > "prefix_extractor_name": "nullptr", "property_collectors": "[]", >> > "compression": "NoCompression", "compression_options": "window_bits=-14; >> > level=32767; strategy=0; max_dict_bytes=0; zstd_max_train_bytes=0; >> > enabled=0; ", "creation_time": 1633471854, "oldest_key_time": 0, >> > "file_creation_time": 1633511850}} >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.581+0000 >> > 7f38fb3ed700 4 rocksdb: [compaction/compaction_job.cc:1415] [L] [JOB >> 5710] >> > Compacted 8@0 + 1@1 files to L1 => 3249524 bytes >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.581+0000 >> > 7f38fb3ed700 4 rocksdb: (Original Log Time 2021/10/06-09:17:30.583469) >> > [compaction/compaction_job.cc:760] [L] compacted to: files[0 1 0 0 0 0 >> 0] >> > max score 0.01, MB/sec: 846.1 rd, 140.7 wr, level 1, files in(8, 1) >> out(1) >> > MB in(17.5, 1.1) out(3.1), read-write-amplify(1.2) write-amplify(0.2) >> OK, >> > records in: 376, records dropped: 340 output_compression: NoCompression >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.581+0000 >> > 7f38fb3ed700 4 rocksdb: (Original Log Time 2021/10/06-09:17:30.583498) >> > EVENT_LOG_v1 {"time_micros": 1633511850583485, "job": 5710, "event": >> > "compaction_finished", "compaction_time_micros": 23098, >> > "compaction_time_cpu_micros": 20039, "output_level": 1, >> "num_output_files": >> > 1, "total_output_size": 3249524, "num_input_records": 376, >> > "num_output_records": 36, "num_subcompactions": 1, "output_compression": >> > "NoCompression", "num_single_delete_mismatches": 0, >> > "num_single_delete_fallthrough": 0, "lsm_state": [0, 1, 0, 0, 0, 0, 0]} >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.581+0000 >> > 7f38fb3ed700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1633511850583615, >> > "job": 5710, "event": "table_file_deletion", "file_number": 9589} >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.581+0000 >> > 7f38fb3ed700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1633511850583648, >> > "job": 5710, "event": "table_file_deletion", "file_number": 9586} >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.581+0000 >> > 7f38fb3ed700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1633511850583675, >> > "job": 5710, "event": "table_file_deletion", "file_number": 9583} >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.581+0000 >> > 7f38fb3ed700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1633511850583709, >> > "job": 5710, "event": "table_file_deletion", "file_number": 9580} >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.581+0000 >> > 7f38fb3ed700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1633511850583739, >> > "job": 5710, "event": "table_file_deletion", "file_number": 9576} >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.581+0000 >> > 7f38fb3ed700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1633511850583769, >> > "job": 5710, "event": "table_file_deletion", "file_number": 9573} >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.581+0000 >> > 7f38fb3ed700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1633511850583804, >> > "job": 5710, "event": "table_file_deletion", "file_number": 9570} >> > cd88-ceph-osdh-01 bash[6283]: debug 2021-10-06T09:17:30.581+0000 >> > 7f38fb3ed700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1633511850583835, >> > "job": 5710, "event": "table_file_deletion", "file_number": 9567} >> > _______________________________________________ >> > ceph-users mailing list -- ceph-users@xxxxxxx >> > To unsubscribe send an email to ceph-users-leave@xxxxxxx >> > > > -- > > Mit freundlichen Grüßen, > > - > > José H. Freidhof > > Reyerhütterstrasse 130b > 41065 Mönchengladbach > eMail: harald.freidhof@xxxxxxxxx > mobil: +49 (0) 1523 – 717 7801 > > -- Mit freundlichen Grüßen, - José H. Freidhof Reyerhütterstrasse 130b 41065 Mönchengladbach eMail: harald.freidhof@xxxxxxxxx mobil: +49 (0) 1523 – 717 7801 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx