Re: Code Walkthrough: BlueStore cache autotuning

Mark Nelson <mnelson@xxxxxxxxxx> · Fri, 11 Jan 2019 08:57:54 -0600

On 1/11/19 7:56 AM, Dr. Jens Harbott (frickler) wrote:
Am Mo., 26. Nov. 2018 um 17:52 Uhr schrieb Josh Durgin <jdurgin@xxxxxxxxxx>:
Recently Mark Nelson added the concept of a memory target to the OSD -
it attempts to keep RSS within a certain size by autotuning the
BlueStore cache size, and also adjusts how the cache works internally to
perform better.
Is there a description somewhere how to collect the data for the cache
usage statistics that you present in your slides? I'd like to monitor
those values in my live setup and verify they look sensible under my
specific workload.

Yours,
Jens

Hi, currently only via level 5 bluestore debug output:

2019-01-04 08:44:18.307 7fc1981c1700  5 
bluestore.MempoolThread(0x5587fd62ab20) _trim_shards cache_size: 
2845415832 kv_alloc: 268435456 kv_used: 2407708 meta_alloc: 855638016 
meta_used: 585053878 data_alloc: 1677721600 data_used: 1711161344
2019-01-04 08:44:18.939 7fc1981c1700  5 
bluestore.MempoolThread(0x5587fd62ab20) _tune_cache_size target: 
4294967296 heap: 2868805632 unmapped: 18399232 mapped: 2850406400 old 
cache_size: 2845415832 new cache size: 2845415832

I should note that the current implementation is a little dumb and may 
end up over-favoring KV or onode cache in situations where the cluster 
is used for both RBD and RGW workloads (but not beyond the cache ratio 
settings).  I have a large PR that is smarter (but may not actually be 
faster due to memory allocator overhead) that also creates per-priority 
byte counters in the admin socket.  It was segfaulting during QA testing 
so I'm breaking it down into a series of smaller PRs that we can test 
individually.  The commit that exposes the admin socket counters hasn't 
been applied yet. Here's the larger one that also includes that code:

https://github.com/ceph/ceph/pull/23710

Thanks,

Mark