Re: block.db/block.wal device performance dropped after upgrade to 14.2.10

Mark Nelson <mnelson@xxxxxxxxxx> · Mon, 10 Aug 2020 10:35:03 -0500

Yeah, I know various folks have adopted those settings, though I'm not 
convinced they are better than our defaults.  Basically you have more 
smaller buffers and start compacting sooner and theoretically should 
have a more gradual throttle along with a bunch of changes to 
compaction, but every time I've tried a setup like that I see more write 
amplification in L0 presumably due to a larger number of pglog entries 
not being tomstoned before hitting it (at least on our systems it's not 
faster at this time, and imposes more wear on DB device).  I suspect 
something closer to those settings will be better though if we can 
change the pglog to create/delete new kv pairs for every pglog entry.

In any event, that's good to know about compaction not being involved.  
I think this may be a case where the double-caching fix might help 
significantly if we stop thrashing the rocksdb block cache: 
https://github.com/ceph/ceph/pull/27705

Mark

On 8/10/20 2:28 AM, Manuel Lausch wrote:
Hi Mark,

rocskdb compactions was one of my first ideas as well. But they don't
correlate. I checkt this with the ceph_rocskdb_log_parser.py from
https://github.com/ceph/cbt.git
I saw only a few compactions on the whole cluster. It didn't seem to be
the problem, although the compactions sometimes took several seconds.

BTW: I configured the following rocksdb options.
   bluestore rocksdb options = compression=kNoCompression,max_write_buffer_number=32,min_write_buffer_number_to_merge=2,recycle_log_file_num=32,compaction_style=kCompactionStyleLevel,write_buffer_size=67108864,target_file_size_base=67108864,max_background_compactions=31,level0_file_num_compaction_trigger=8,level0_slowdown_writes_trigger=32,level0_stop_writes_trigger=64,max_bytes_for_level_base=536870912,compaction_threads=32,max_bytes_for_level_multiplier=8,flusher_threads=8,compaction_readahead_size=2MB

This reduced some IO spikes but the slowops isse while snaptim was not
affected by this.

Manuel

On Fri, 7 Aug 2020 09:43:51 -0500
Mark Nelson <mnelson@xxxxxxxxxx> wrote:

That is super interesting regarding scrubbing.  I would have expected
that to be affected as well.  Any  chance you can check and see if
there is any correlation between rocksdb compaction events, snap
trimming, and increased disk reads?  Also (Sorry if you already
answered this) do we know for sure that it's hitting the
block.db/block.wal device?  I suspect it is, just wanted to verify.

Mark

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx