Re: luminous/bluetsore osd memory requirements

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 08/14/2017 02:42 PM, Nick Fisk wrote:
-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
Ronny Aasen
Sent: 14 August 2017 18:55
To: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  luminous/bluetsore osd memory requirements

On 10.08.2017 17:30, Gregory Farnum wrote:
This has been discussed a lot in the performance meetings so I've
added Mark to discuss. My naive recollection is that the per-terabyte
recommendation will be more realistic  than it was in the past (an
effective increase in memory needs), but also that it will be under
much better control than previously.


Is there any way to tune or reduce the memory footprint? perhaps by
sacrificing performace ? our jewel cluster osd servers is maxed out on
memory. And with the added memory requirements I  fear we may not be
able to upgrade to luminous/bluestore..

Check out this PR, it shows the settings to set memory used for cache and
their defaults

https://github.com/ceph/ceph/pull/16157

Hey guys, sorry for the late reply. The gist of it is that memory is used in bluestore is a couple of different ways:

1) various internal buffers and such
2) bluestore specific cache (unencoded onodes, extents, etc)
3) rocksdb block cache
  3a) encoded data from bluestore
  3b) bloom filters and table indexes
4) other rocksdb memory/etc

Right now when you set the bluestore cache size it first favors rocksdb block cache up to 512MB and then start favoring bluestore onode cache after that. Even without bloom filters that seems to improve bluestore performance with small cache sizes. With bloom filters it's likely even more important to feed whatever you can to rocksdb's block cache to keep the index and bloom filters in memory as much as possible. It's unclear right now how quickly we should let the block cache grow as the number of objects increases. Prior to using bloom filters it appeared that favoring the onode cache was better. Now we probably both want to favor the bloom filters and bluestore's onode cache.

So the first order of business is to see how changing the bluestore cache size hurts you. Bluestore's default behavior of favoring the rocksdb block cache (and specifically the bloom filters) first is probably still decent but you may want to play around with it if you expect a lot of small objects and limited memory. For really low memory scenarios you could also try reducing the rocksdb buffer sizes, but smaller buffers are going to give you higher write-amp. It's possible this PR may help though:

https://github.com/ceph/rocksdb/pull/19

You might be able to lower memory further with smaller PG/OSD maps, but at some point you start hitting diminishing returns.

Mark




kind regards
Ronny Aasen
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux