On Fri, 16 Mar 2018, Wido den Hollander wrote: > Hi, > > The config values bluestore_cache_size_ssd and bluestore_cache_size_hdd > determine how much memory a OSD running with Bluestore will use for caching. > > By default the values are: > > bluestore_cache_size_ssd = 3GB > bluestore_cache_size_hdd = 1GB > > I've seen some cases recently where users migrated from FileStore to > BlueStore and had the OOM-killer come along during backfill/recovery > siautions. These are the situations where OSDs require more memory. > > It's not uncommon to find servers with: > > - 8 SSDs and 32GB RAM > - 16 SSDs and 64GB RAM > > With FileStore it was sufficient since the page cache did all the work, > but with BlueStore each OSD has it's own cache which isn't shared. > > In addition there is the regular memory consumption and the overhead of > the cache. > > I also don't understand the ideas behind the values. As HDDs are slower > the usually require more cache then SSDs, so I'd expect the values to be > flipped. > > My recommendation would be to lower the value to 1GB to prevent users > from having a bad experience when going from FileStore to BlueStore. > > I have created a pull request for this: > https://github.com/ceph/ceph/pull/20940 > > Opinions, experiences, feedback? The thinking was that bluestore requires some deliberate thinking and tuning on the cache size, so we may as well pick defaults that make sense. Since the admin is doing the filestore -> bluestore conversion, that is the point where they consider the memory requirement and adjust the config as necessary. As for why the defaults are different, the SSDs need a larger cache to capture the SSD performance, and the nodes that have them are likely to be "higher end" and have more memory. The idea is the minimize the number of people that will need to adjust their config. Perhaps the missing piece here is that the filestore->bluestore conversion doc should have a section about memory requirements and tuning bluestore_cache_size accordingly? If we just reduce the default to satisfy the lowest common denominator we'll kill performance for the majority that has more memory. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html