Hi, I'm facing this issue too and I see the attached rocksdb log from Mark in my cluster which means there is a burst read on my block.db. I've sent some information from my issue in this thread[1]. Hope you help me with what's going on in my cluster. Thanks. [1]: https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/PHB53F3OD7QN5FG3CXGKTLWE77OHIBBO/ On Mon, Aug 10, 2020 at 8:05 PM Mark Nelson <mnelson@xxxxxxxxxx> wrote: > Yeah, I know various folks have adopted those settings, though I'm not > convinced they are better than our defaults. Basically you have more > smaller buffers and start compacting sooner and theoretically should > have a more gradual throttle along with a bunch of changes to > compaction, but every time I've tried a setup like that I see more write > amplification in L0 presumably due to a larger number of pglog entries > not being tomstoned before hitting it (at least on our systems it's not > faster at this time, and imposes more wear on DB device). I suspect > something closer to those settings will be better though if we can > change the pglog to create/delete new kv pairs for every pglog entry. > > > In any event, that's good to know about compaction not being involved. > I think this may be a case where the double-caching fix might help > significantly if we stop thrashing the rocksdb block cache: > https://github.com/ceph/ceph/pull/27705 > > > Mark > > > On 8/10/20 2:28 AM, Manuel Lausch wrote: > > Hi Mark, > > > > rocskdb compactions was one of my first ideas as well. But they don't > > correlate. I checkt this with the ceph_rocskdb_log_parser.py from > > https://github.com/ceph/cbt.git > > I saw only a few compactions on the whole cluster. It didn't seem to be > > the problem, although the compactions sometimes took several seconds. > > > > BTW: I configured the following rocksdb options. > > bluestore rocksdb options = > compression=kNoCompression,max_write_buffer_number=32,min_write_buffer_number_to_merge=2,recycle_log_file_num=32,compaction_style=kCompactionStyleLevel,write_buffer_size=67108864,target_file_size_base=67108864,max_background_compactions=31,level0_file_num_compaction_trigger=8,level0_slowdown_writes_trigger=32,level0_stop_writes_trigger=64,max_bytes_for_level_base=536870912,compaction_threads=32,max_bytes_for_level_multiplier=8,flusher_threads=8,compaction_readahead_size=2MB > > > > This reduced some IO spikes but the slowops isse while snaptim was not > > affected by this. > > > > > > Manuel > > > > On Fri, 7 Aug 2020 09:43:51 -0500 > > Mark Nelson <mnelson@xxxxxxxxxx> wrote: > > > >> That is super interesting regarding scrubbing. I would have expected > >> that to be affected as well. Any chance you can check and see if > >> there is any correlation between rocksdb compaction events, snap > >> trimming, and increased disk reads? Also (Sorry if you already > >> answered this) do we know for sure that it's hitting the > >> block.db/block.wal device? I suspect it is, just wanted to verify. > >> > >> > >> Mark > >> > >> > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx