Re: block.db/block.wal device performance dropped after upgrade to 14.2.10

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yeah, there are cases where enabling it will improve performance as rocksdb can then used the page cache as a (potentially large) secondary cache beyond the block cache and avoid hitting the underlying devices for reads.  Do you have a lot of spare memory for page cache on your OSD nodes? You may be able to improve the situation with bluefs_buffered_io=false by increasing the osd_memory_target which should give the rocksdb block cache more memory to work with directly.  One downside is that we currently double cache onodes in both the rocksdb cache and bluestore onode cache which hurts us when memory limited.  We have some experimental work that might help in this area by better balancing bluestore onode and rocksdb block caches but it needs to be rebased after Adam's column family sharding work.

The reason we had to disable bluefs_buffered_io again was that we had users with certain RGW workloads where the kernel started swapping large amounts of memory on the OSD nodes despite seemingly have free memory available.  This caused huge latency spikes and IO slowdowns (even stalls).  We never noticed it in our QA test suites and it doesn't appear to happen with RBD workloads as far as I can tell, but when it does happen it's really painful.


Mark


On 8/6/20 6:53 AM, Manuel Lausch wrote:
Hi,

I found the reasen of this behavior change.
With 14.2.10 the default value of "bluefs_buffered_io" was changed from
true to false.
https://tracker.ceph.com/issues/44818

configureing this to true my problems seems to be solved.

Regards
Manuel

On Wed, 5 Aug 2020 13:30:45 +0200
Manuel Lausch <manuel.lausch@xxxxxxxx> wrote:

Hello Vladimir,

I just tested this with a single node testcluster with 60 HDDs (3 of
them with bluestore without separate wal and db).

With the 14.2.10, I see on the bluestore OSDs a lot of read IOPs while
snaptrimming. With 14.2.9 this was not an issue.

I wonder if this would explain the huge amount of slowops on my big
testcluster (44 Nodes 1056 OSDs) while snaptrimming. I
cannot test a downgrade there, because there are no packages of older
releases for CentOS 8 available.

Regards
Manuel

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux