Re: block.db/block.wal device performance dropped after upgrade to 14.2.10

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thinking about this a little more, one thing that I remember when I was writing the priority cache manager is that in some cases I saw strange behavior with the rocksdb block cache when compaction was performed.  It appeared that the entire contents of the cache could be invalidated.  I guess that would only make sense if it was trimming old entries from the cache instead of entries associated with (now deleted) sst files or perhaps waiting to delete all SST files until the end of the compaction cycle thus forcing old entries out of the cache and then invalidating the whole works.

In any event, I wonder if having the secondary page cache is enough on your clusters to sort of get around all of this by still having the SST files associated with the previously heavily used blocks in page cache kicking around until compaction completes. Maybe the combination of snap trimming or other background work along with compaction is just totally thrashing the rocksdb block cache.  For folks that feel comfortable watching IO hitting your DB devices, can you see if you have increased bursts of reads to the DB device after a compaction event has occurred?  They look like this in the OSD logs:


2020-08-04T17:15:56.603+0000 7fb0cf60d700  4 rocksdb: (Original Log Time 2020/08/04-17:15:56.603585) EVENT_LOG_v1 {"time_micros": 1596561356603574, "job": 5, "event": "compaction_finished", "compaction_time_micros": 744532, "compaction_time_cpu_micros": 607655, "output_level": 1, "num_output_files": 2, "total_output_size": 84712923, "num_input_records": 1714260, "num_output_records": 658541, "num_subcompactions": 1, "output_compression": "NoCompression", "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0, "lsm_state": [0, 2, 0, 0, 0, 0, 0]}


You can also run this tool to get a nicely formatted list of them, though I don't have it reporting timestamps, just the time offset from the start of the log so looking at the OSD logs directly would be easier to match up timestamps.


https://github.com/ceph/cbt/blob/master/tools/ceph_rocksdb_log_parser.py


Mark


On 8/6/20 8:07 AM, Vladimir Prokofev wrote:
Maneul, thank you for your input.
This is actually huge, and the problem is exactly that.

On a side note I will add, that I observed lower memory utilisation on OSD
nodes since the update, and a big throughput on block.db devices(up to
100+MB/s) that was not there before, so logically that meant that some
operations that were performed in memory before, now were executed directly
on block device. Was digging through possible causes, but your time-saving
message arrived earlier.
Thank you!

чт, 6 авг. 2020 г. в 14:56, Manuel Lausch <manuel.lausch@xxxxxxxx>:

Hi,

I found the reasen of this behavior change.
With 14.2.10 the default value of "bluefs_buffered_io" was changed from
true to false.
https://tracker.ceph.com/issues/44818

configureing this to true my problems seems to be solved.

Regards
Manuel

On Wed, 5 Aug 2020 13:30:45 +0200
Manuel Lausch <manuel.lausch@xxxxxxxx> wrote:

Hello Vladimir,

I just tested this with a single node testcluster with 60 HDDs (3 of
them with bluestore without separate wal and db).

With the 14.2.10, I see on the bluestore OSDs a lot of read IOPs while
snaptrimming. With 14.2.9 this was not an issue.

I wonder if this would explain the huge amount of slowops on my big
testcluster (44 Nodes 1056 OSDs) while snaptrimming. I
cannot test a downgrade there, because there are no packages of older
releases for CentOS 8 available.

Regards
Manuel

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux