Hi, on my ssd cluster (1,6TB intel s3610), I'm seeing 1G RSS memory on filestore vs 7,5-8,5GB on bluestore. (with default ceph.conf, no tuning). Currently, I'm restart my osd each 2week to avoid out of memory. is it a normal ? I'm far from 3G memory by osd. filestore jewel --------------- USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND ceph 48957 13.9 1.6 2984276 1097996 ? Ssl 2017 65408:50 /usr/bin/ceph-osd -f --cluster ceph --id 3 --setuser ceph --setgroup ceph bluestore luminous ------------------ USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND ceph 1718009 2.5 11.7 8542012 7725992 ? Ssl 2017 2463:28 /usr/bin/ceph-osd -f --cluster ceph --id 5 --setuser ceph --setgroup ceph # ceph daemon osd.5 dump_mempools { "bloom_filter": { "items": 0, "bytes": 0 }, "bluestore_alloc": { "items": 98449088, "bytes": 98449088 }, "bluestore_cache_data": { "items": 759, "bytes": 17276928 }, "bluestore_cache_onode": { "items": 884140, "bytes": 594142080 }, "bluestore_cache_other": { "items": 116375567, "bytes": 2072801299 }, "bluestore_fsck": { "items": 0, "bytes": 0 }, "bluestore_txc": { "items": 6, "bytes": 4320 }, "bluestore_writing_deferred": { "items": 99, "bytes": 1190045 }, "bluestore_writing": { "items": 11, "bytes": 4510159 }, "bluefs": { "items": 1202, "bytes": 64136 }, "buffer_anon": { "items": 76863, "bytes": 21327234 }, "buffer_meta": { "items": 910, "bytes": 80080 }, "osd": { "items": 328, "bytes": 3956992 }, "osd_mapbl": { "items": 0, "bytes": 0 }, "osd_pglog": { "items": 1118050, "bytes": 286277600 }, "osdmap": { "items": 6073, "bytes": 551872 }, "osdmap_mapping": { "items": 0, "bytes": 0 }, "pgmap": { "items": 0, "bytes": 0 }, "mds_co": { "items": 0, "bytes": 0 }, "unittest_1": { "items": 0, "bytes": 0 }, "unittest_2": { "items": 0, "bytes": 0 }, "total": { "items": 216913096, "bytes": 3100631833 } } ----- Mail original ----- De: "Mark Nelson" <mark.a.nelson@xxxxxxxxx> À: "Sage Weil" <sage@xxxxxxxxxxxx>, "Wido den Hollander" <wido@xxxxxxxx> Cc: "ceph-devel" <ceph-devel@xxxxxxxxxxxxxxx> Envoyé: Vendredi 16 Mars 2018 17:41:03 Objet: Re: bluestore_cache_size_ssd and bluestore_cache_size_hdd default values On 03/16/2018 11:08 AM, Sage Weil wrote: > On Fri, 16 Mar 2018, Wido den Hollander wrote: >> Hi, >> >> The config values bluestore_cache_size_ssd and bluestore_cache_size_hdd >> determine how much memory a OSD running with Bluestore will use for caching. >> >> By default the values are: >> >> bluestore_cache_size_ssd = 3GB >> bluestore_cache_size_hdd = 1GB >> >> I've seen some cases recently where users migrated from FileStore to >> BlueStore and had the OOM-killer come along during backfill/recovery >> siautions. These are the situations where OSDs require more memory. >> >> It's not uncommon to find servers with: >> >> - 8 SSDs and 32GB RAM >> - 16 SSDs and 64GB RAM >> >> With FileStore it was sufficient since the page cache did all the work, >> but with BlueStore each OSD has it's own cache which isn't shared. >> >> In addition there is the regular memory consumption and the overhead of >> the cache. >> >> I also don't understand the ideas behind the values. As HDDs are slower >> the usually require more cache then SSDs, so I'd expect the values to be >> flipped. >> >> My recommendation would be to lower the value to 1GB to prevent users >> from having a bad experience when going from FileStore to BlueStore. >> >> I have created a pull request for this: >> https://github.com/ceph/ceph/pull/20940 >> >> Opinions, experiences, feedback? > > The thinking was that bluestore requires some deliberate thinking > and tuning on the cache size, so we may as well pick defaults that make > sense. Since the admin is doing the filestore -> bluestore conversion, > that is the point where they consider the memory requirement and adjust > the config as necessary. > > As for why the defaults are different, the SSDs need a larger cache to > capture the SSD performance, and the nodes that have them are likely to be > "higher end" and have more memory. The idea is the minimize the number > of people that will need to adjust their config. > > Perhaps the missing piece here is that the filestore->bluestore conversion > doc should have a section about memory requirements and tuning > bluestore_cache_size accordingly? If we just reduce the default to > satisfy the lowest common denominator we'll kill performance for the > majority that has more memory. On a side note, we are not currently enforcing a hard cap on rocksdb block cache usage. During certain test scenarios, I've observed the block cache exceeding the soft cap during compaction. I suspect this is primarily an issue when dealing with very fast storage and very low memory, but it may contribute to scenarios where folks are going OOM on low memory configurations. Mark > > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html