very high OSD RAM usage values

Kenneth Waegeman <kenneth.waegeman@xxxxxxxx> · Wed, 6 Jan 2016 12:12:33 +0100

Hi all,

We experienced some serious trouble with our cluster: A running cluster 
started failing and started a chain reaction until the ceph cluster was 
down, as about half the OSDs are down (in a EC pool)

Each host has 8 OSDS of 8 TB (i.e. RAID 0 of 2 4TB disk) for an EC pool 
(10+3, 14 hosts) and 2 cache OSDS and 32 GB of RAM.
The reason we have the Raid0 of the disks, is because we tried with 16 
disk before, but 32GB didn't seem enough to keep the cluster stable

We don't know for sure what triggered the chain reaction, but what we 
certainly see, is that while recovering, our OSDS are using a lot of 
memory. We've seen some OSDS using almost 8GB of RAM (resident; virtual 
11GB)
So right now we don't have enough memory to recover the cluster, because 
the  OSDS  get killed by OOMkiller before they can recover..
And I don't know doubling our memory will be enough..

A few questions:

* Does someone has seen this before?
* 2GB was still normal, but 8GB seems a lot, is this expected behaviour?
* We didn't see this with an nearly empty cluster. Now it was filled 
about 1/4 (270TB). I guess it would become worse when filled half or more?
* How high can this memory usage become ? Can we calculate the maximum 
memory of an OSD? Can we limit it ?
* We can upgrade/reinstall to infernalis, will that solve anything?

This is related to a previous post of me : 
http://permalink.gmane.org/gmane.comp.file-systems.ceph.user/22259

Thank you very much !!

Kenneth

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com