hi Sage, I wrote 100% size of 4T rbd, the log seemed never been compact...the perfcounter output as follow: "bluefs": { "gift_bytes": 0, "reclaim_bytes": 0, "db_total_bytes": 16106119168, "db_used_bytes": 16106119168, "wal_total_bytes": 5368705024, "wal_used_bytes": 5368705024, "slow_total_bytes": 79153848320, "slow_used_bytes": 2502942720, "num_files": 33, "log_bytes": 4380053504, "log_compactions": 0, "logged_bytes": 4378685440, "files_written_wal": 26, "files_written_sst": 206, "bytes_written_wal": 8219878782, "bytes_written_sst": 12998417672 }, my ceph version is 12.0.2 bluestore_rocksdb_options = compression=kNoCompression, max_write_buffer_number=2, min_write_buffer_number_to_merge=1, write_buffer_size=268435456, writable_file_max_buffer_size=0 missing any rocksdb options cause the problem? Thanks & Regards! 2017-08-29 21:45 GMT+08:00 Sage Weil <sweil@xxxxxxxxxx>: > On Tue, 29 Aug 2017, zengran zhang wrote: >> thanks! i open the debug_bluefs to 10, but did not see the >> "_should_compact_log" log when fio is writing the rbd...so how /offen >> will BlueFS::sync_metadata() be called? > > Not that often. We only need to write to the bluefs log when new rocksdb > files are created.. and they are pretty big. So it will need to age for > quite a while before anything happens. > > You can get some sense of it by watching the 'ceph daemonperf osd.0' > output on a running OSD and watching the bluefs.wal column. There is also > a bluefs.log_bytes counter in the full perf dump, although you won't see > the estimated size to compare it against. > > sage > >> >> 2017-08-29 21:19 GMT+08:00 Sage Weil <sweil@xxxxxxxxxx>: >> > On Tue, 29 Aug 2017, zengran zhang wrote: >> >> hi Sage, >> >> I want to ask when will the bluefs log be compacted? I only sure >> >> it will be compacted when umount the bluefs... >> >> I see the log be compact when rocksdb call Dir.Fsync(), but want to >> >> know how to trigger this... >> > >> > There is a heuristic for when it gets "too big": >> > >> > bool BlueFS::_should_compact_log() >> > { >> > uint64_t current = log_writer->file->fnode.size; >> > uint64_t expected = _estimate_log_size(); >> > float ratio = (float)current / (float)expected; >> > dout(10) << __func__ << " current 0x" << std::hex << current >> > << " expected " << expected << std::dec >> > << " ratio " << ratio >> > << (new_log ? " (async compaction in progress)" : "") >> > << dendl; >> > if (new_log || >> > current < cct->_conf->bluefs_log_compact_min_size || >> > ratio < cct->_conf->bluefs_log_compact_min_ratio) { >> > return false; >> > } >> > return true; >> > } >> > >> > and the estimate for the (compacted) size is >> > >> > uint64_t BlueFS::_estimate_log_size() >> > { >> > int avg_dir_size = 40; // fixme >> > int avg_file_size = 12; >> > uint64_t size = 4096 * 2; >> > size += file_map.size() * (1 + sizeof(bluefs_fnode_t)); >> > for (auto& p : block_all) >> > size += p.num_intervals() * (1 + 1 + sizeof(uint64_t) * 2); >> > size += dir_map.size() + (1 + avg_dir_size); >> > size += file_map.size() * (1 + avg_dir_size + avg_file_size); >> > return ROUND_UP_TO(size, super.block_size); >> > } >> > >> > The default min_ratio is 5... so we compact when it's ~5x bigger than it >> > needs to be. >> > >> > sage >> > >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html