Re: bluestore_cache_size_ssd and bluestore_cache_size_hdd default values

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/16/2018 11:08 AM, Sage Weil wrote:
On Fri, 16 Mar 2018, Wido den Hollander wrote:
Hi,

The config values bluestore_cache_size_ssd and bluestore_cache_size_hdd
determine how much memory a OSD running with Bluestore will use for caching.

By default the values are:

bluestore_cache_size_ssd = 3GB
bluestore_cache_size_hdd = 1GB

I've seen some cases recently where users migrated from FileStore to
BlueStore and had the OOM-killer come along during backfill/recovery
siautions. These are the situations where OSDs require more memory.

It's not uncommon to find servers with:

- 8 SSDs and 32GB RAM
- 16 SSDs and 64GB RAM

With FileStore it was sufficient since the page cache did all the work,
but with BlueStore each OSD has it's own cache which isn't shared.

In addition there is the regular memory consumption and the overhead of
the cache.

I also don't understand the ideas behind the values. As HDDs are slower
the usually require more cache then SSDs, so I'd expect the values to be
flipped.

My recommendation would be to lower the value to 1GB to prevent users
from having a bad experience when going from FileStore to BlueStore.

I have created a pull request for this:
https://github.com/ceph/ceph/pull/20940

Opinions, experiences, feedback?

The thinking was that bluestore requires some deliberate thinking
and tuning on the cache size, so we may as well pick defaults that make
sense.  Since the admin is doing the filestore -> bluestore conversion,
that is the point where they consider the memory requirement and adjust
the config as necessary.

As for why the defaults are different, the SSDs need a larger cache to
capture the SSD performance, and the nodes that have them are likely to be
"higher end" and have more memory.  The idea is the minimize the number
of people that will need to adjust their config.

Perhaps the missing piece here is that the filestore->bluestore conversion
doc should have a section about memory requirements and tuning
bluestore_cache_size accordingly?  If we just reduce the default to
satisfy the lowest common denominator we'll kill performance for the
majority that has more memory.

On a side note, we are not currently enforcing a hard cap on rocksdb block cache usage. During certain test scenarios, I've observed the block cache exceeding the soft cap during compaction. I suspect this is primarily an issue when dealing with very fast storage and very low memory, but it may contribute to scenarios where folks are going OOM on low memory configurations.

Mark


sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux