On 08/14/2017 02:42 PM, Nick Fisk wrote:
-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
Ronny Aasen
Sent: 14 August 2017 18:55
To: ceph-users@xxxxxxxxxxxxxx
Subject: Re: luminous/bluetsore osd memory requirements
On 10.08.2017 17:30, Gregory Farnum wrote:
This has been discussed a lot in the performance meetings so I've
added Mark to discuss. My naive recollection is that the per-terabyte
recommendation will be more realistic than it was in the past (an
effective increase in memory needs), but also that it will be under
much better control than previously.
Is there any way to tune or reduce the memory footprint? perhaps by
sacrificing performace ? our jewel cluster osd servers is maxed out on
memory. And with the added memory requirements I fear we may not be
able to upgrade to luminous/bluestore..
Check out this PR, it shows the settings to set memory used for cache and
their defaults
https://github.com/ceph/ceph/pull/16157
Hey guys, sorry for the late reply. The gist of it is that memory is
used in bluestore is a couple of different ways:
1) various internal buffers and such
2) bluestore specific cache (unencoded onodes, extents, etc)
3) rocksdb block cache
3a) encoded data from bluestore
3b) bloom filters and table indexes
4) other rocksdb memory/etc
Right now when you set the bluestore cache size it first favors rocksdb
block cache up to 512MB and then start favoring bluestore onode cache
after that. Even without bloom filters that seems to improve bluestore
performance with small cache sizes. With bloom filters it's likely even
more important to feed whatever you can to rocksdb's block cache to keep
the index and bloom filters in memory as much as possible. It's unclear
right now how quickly we should let the block cache grow as the number
of objects increases. Prior to using bloom filters it appeared that
favoring the onode cache was better. Now we probably both want to favor
the bloom filters and bluestore's onode cache.
So the first order of business is to see how changing the bluestore
cache size hurts you. Bluestore's default behavior of favoring the
rocksdb block cache (and specifically the bloom filters) first is
probably still decent but you may want to play around with it if you
expect a lot of small objects and limited memory. For really low memory
scenarios you could also try reducing the rocksdb buffer sizes, but
smaller buffers are going to give you higher write-amp. It's possible
this PR may help though:
https://github.com/ceph/rocksdb/pull/19
You might be able to lower memory further with smaller PG/OSD maps, but
at some point you start hitting diminishing returns.
Mark
kind regards
Ronny Aasen
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com