Hi Jorge,
I was sort of responsible for all of this. :)
So basically there are different caches in different places:
- rocksdb bloom filter and index cache
- rocksdb block cache (which can be configured to include filters and
indexes)
- rocksdb compressed block cache
- bluestore onode cache
The bluestore onode cache is the only one that stores onode/extent/blob
metadata before it is encoded, ie it's bigger but has lower impact on
the CPU. The next step is the regular rocksdb block cache where we've
already encoded the data, but it's not compressed. Optionally we could
also compress the data and then cache it using rocksdb's compressed
block cache. Finally, rocksdb can set memory aside for bloom filters
and indexes but we're configuring those to go into the block cache so we
can get a better accounting for how memory is being used (otherwise it's
difficult to control how much memory index and filters get). The
downside is that bloom filters and indexes can theoretically get paged
out under heavy cache pressure. We set these to be high priority in the
block cache and also pin the L0 filters/index though to help avoid this.
In the testing I did earlier this year, what I saw is that in low memory
scenarios it's almost always best to give all of the cache to rocksdb's
block cache. Once you hit about the 512MB mark, we start seeing bigger
gains by giving additional memory to bluestore's onode cache. So we
devised a mechanism where you can decide where to cut over. It's quite
possible that on very fast CPUs it might make sense ot use rocksdb
compressed cache, or possibly if you have a huge number of objects these
ratios might change. The values we have now were sort of the best
jack-of-all-trades values we found.
Mark
On 10/11/2017 08:32 AM, Jorge Pinilla López wrote:
okay, thanks for the explanation, so from the 3GB of Cache (default
cache for SSD) only a 0.5GB is going to K/V and 2.5 going to metadata.
Is there a way of knowing how much k/v, metadata, data is storing and
how full cache is so I can adjust my ratios?, I was thinking some ratios
(like 0.9 k/v, 0.07 meta 0.03 data) but only speculating, I dont have
any real data.
El 11/10/2017 a las 14:32, Mohamad Gebai escribió:
Hi Jorge,
On 10/10/2017 07:23 AM, Jorge Pinilla López wrote:
Are .99 KV, .01 MetaData and .0 Data ratios right? they seem a little
too disproporcionate.
Yes, this is correct.
Also .99 KV and Cache of 3GB for SSD means that almost the 3GB would
be used for KV but there is also another attributed called
bluestore_cache_kv_max which is by fault 512MB, then what is the rest
of the cache used for?, nothing? shouldnt it be more kv_max value or
less KV ratio?
Anything over the *cache_kv_max value goes to the metadata cache. You
can look in your logs to see the final values of kv, metadata and data
cache ratios. To get data cache, you need to lower the ratios of
metadata and kv caches.
Mohamad
--
------------------------------------------------------------------------
*Jorge Pinilla López*
jorpilo@xxxxxxxxx
Estudiante de ingenieria informática
Becario del area de sistemas (SICUZ)
Universidad de Zaragoza
PGP-KeyID: A34331932EBC715A
<http://pgp.rediris.es:11371/pks/lookup?op=get&search=0xA34331932EBC715A>
------------------------------------------------------------------------
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com