On 03/16/2018 05:08 PM, Sage Weil wrote: > On Fri, 16 Mar 2018, Wido den Hollander wrote: >> Hi, >> >> The config values bluestore_cache_size_ssd and bluestore_cache_size_hdd >> determine how much memory a OSD running with Bluestore will use for caching. >> >> By default the values are: >> >> bluestore_cache_size_ssd = 3GB >> bluestore_cache_size_hdd = 1GB >> >> I've seen some cases recently where users migrated from FileStore to >> BlueStore and had the OOM-killer come along during backfill/recovery >> siautions. These are the situations where OSDs require more memory. >> >> It's not uncommon to find servers with: >> >> - 8 SSDs and 32GB RAM >> - 16 SSDs and 64GB RAM >> >> With FileStore it was sufficient since the page cache did all the work, >> but with BlueStore each OSD has it's own cache which isn't shared. >> >> In addition there is the regular memory consumption and the overhead of >> the cache. >> >> I also don't understand the ideas behind the values. As HDDs are slower >> the usually require more cache then SSDs, so I'd expect the values to be >> flipped. >> >> My recommendation would be to lower the value to 1GB to prevent users >> from having a bad experience when going from FileStore to BlueStore. >> >> I have created a pull request for this: >> https://github.com/ceph/ceph/pull/20940 >> >> Opinions, experiences, feedback? > > The thinking was that bluestore requires some deliberate thinking > and tuning on the cache size, so we may as well pick defaults that make > sense. Since the admin is doing the filestore -> bluestore conversion, > that is the point where they consider the memory requirement and adjust > the config as necessary. > I understand the thinking, but I think it doesn't apply. > As for why the defaults are different, the SSDs need a larger cache to > capture the SSD performance, and the nodes that have them are likely to be > "higher end" and have more memory. The idea is the minimize the number > of people that will need to adjust their config. > Is 3GB really required for a OSD? Or might 2GB also work? > Perhaps the missing piece here is that the filestore->bluestore conversion > doc should have a section about memory requirements and tuning > bluestore_cache_size accordingly? If we just reduce the default to > satisfy the lowest common denominator we'll kill performance for the > majority that has more memory. > Is 8 OSDs on 32GB really that low? If we look at the docs: http://docs.ceph.com/docs/master/start/hardware-recommendations/#ram "OSDs do not require as much RAM for regular operations (e.g., 500MB of RAM per daemon instance); however, during recovery they need significantly more RAM (e.g., ~1GB per 1TB of storage per daemon). Generally, more RAM is better." So somebody who has machines which were running just fine with FileStore for the last 2 years don't expect that when switching to BlueStore they have to look into this. They are however faced with the OOM-killer and frustrated people in their organization. Who at that point blame BlueStore. I've been called for these kind of situations a few times in the last months and in all cases I had to lower the bluestore cache size. Yes, I do agree that people should read the Release Notes, but is that really sufficient? Not everybody will do that. I'd say: - Lower cache size for SSD to 1GB or 2GB - Update the docs to tell people to increase the cache to improve performance The OOM-killer can become a true snowball effect in clusters which is a serious issues for people. I'd rather have a slightly lower performance then daemons going down. Wido > sage > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html