Re: out of memory bluestore osds

Jaime Ibar <jaime@xxxxxxxxxxxx> · Thu, 8 Aug 2019 09:46:28 +0100

Hi Mark,

thanks a lot for your explanation and clarification.

Adjusting osd_memory_target to fit in our systems did the trick.

Jaime

On 07/08/2019 14:09, Mark Nelson wrote:
Hi Jaime,

we only use the cache size parameters now if you've disabled 
autotuning.  With autotuning we adjust the cache size on the fly to 
try and keep the mapped process memory under the osd_memory_target.  
You can set a lower memory target than default, though you will have 
far less cache for bluestore onodes and rocksdb.  You may notice that 
it's slower, especially if you have a big active data set you are 
processing.  I don't usually recommend setting the osd_memory_target 
below 2GB.  At some point it will have shrunk the caches as far as it 
can and the process memory may start exceeding the target.  (with our 
default rocksdb and pglog settings this usually happens somewhere 
between 1.3-1.7GB once the OSD has been sufficiently saturated with 
IO). Given memory prices right now, I'd still recommend upgrading RAM 
if you have the ability though.  You might be able to get away with 
setting each OSD to 2-2.5GB in your scenario but you'll be pushing it.

I would not recommend lowering the osd_memory_cache_min.  You really 
want rocksdb indexes/filters fitting in cache, and as many bluestore 
onodes as you can get.  In any event, you'll still be bound by the 
(currently hardcoded) 64MB cache chunk allocation size in the 
autotuner which osd_memory_cache_min can't reduce (and that's per 
cache while osd_memory_cache_min is global for the kv,buffer, and 
rocksdb block caches).  IE each cache is going to get 64MB+growth room 
regardless of how low you set osd_memory_cache_min.  That's 
intentional as we don't want a single SST file in rocksdb to be able 
to completely blow everything else out of the block cache during 
compaction, only to quickly become invalid, removed from the cache, 
and make it look to the priority cache system like rocksdb doesn't 
actually need any more memory for cache.

Mark

On 8/7/19 7:44 AM, Jaime Ibar wrote:
Hi all,

we run a Ceph Luminous 12.2.12 cluster, 7 osds servers 12x4TB disks 
each.
Recently we redeployed the osds of one of them using bluestore backend,
however, after this, we're facing Out of memory errors(invoked 
oom-killer)
and the OS kills one of the ceph-osd process.
The osd is restarted automatically and back online after one minute.
We're running Ubuntu 16.04, kernel 4.15.0-55-generic.
The server has 32GB of RAM and 4GB of swap partition.
All the disks are hdd, no ssd disks.
Bluestore settings are the default ones

"osd_memory_target": "4294967296"
"osd_memory_cache_min": "134217728"
"bluestore_cache_size": "0"
"bluestore_cache_size_hdd": "1073741824"
"bluestore_cache_autotune": "true"

As stated in the documentation, bluestore assigns by default 4GB of
RAM per osd(1GB of RAM for 1TB).
So in this case 48GB of RAM would be needed. Am I right?

Are these the minimun requirements for bluestore?
In case adding more RAM is not an option, can any of
osd_memory_target, osd_memory_cache_min, bluestore_cache_size_hdd
be decrease to fit in our server specs?
Would this have any impact on performance?

Thanks
Jaime

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--

Jaime Ibar
High Performance & Research Computing, IS Services
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/ | jaime@xxxxxxxxxxxx
Tel: +353-1-896-3725

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com