OSD servers swapping despite having free memory capacity

Samuel Taylor Liston <sam.liston@xxxxxxxx> · Tue, 23 Jan 2018 19:54:49 +0000

We have a 9 - node (16 - 8TB OSDs per node) running jewel on centos 7.4.  The OSDs are configured with encryption.  The cluster is accessed via two - RGWs  and there are 3 - mon servers.  The data pool is using 6+3 erasure coding.

About 2 weeks ago I found two of the nine servers wedged and had to hard power cycle them to get them back.  In this hard reboot 22 - OSDs came back with either a corrupted encryption or data partitions.  These OSDs were removed and recreated, and the resultant rebalance moved along just fine for about a week.  At the end of that week two different nodes were unresponsive complaining of page allocation failures.  This is when I realized the nodes were heavy into swap.  These nodes were configured with 64GB of RAM as a cost saving going against the 1GB per 1TB recommendation.  We have since then doubled the RAM in each of the nodes giving each of them more than the 1GB per 1TB ratio.  

The issue I am running into is that these nodes are still swapping; a lot, and over time becoming unresponsive, or throwing page allocation failures.  As an example, “free” will show 15GB of RAM usage (out of 128GB) and 32GB of swap.  I have configured swappiness to 0 and and also turned up the vm.min_free_kbytes to 4GB to try to keep the kernel happy, and yet I am still filling up swap.  It only occurs when the OSDs have mounted partitions and ceph-osd daemons active. 

Anyone have an idea where this swap usage might be coming from? 
Thanks for any insight,

Sam Liston (sam.liston@xxxxxxxx)
====================================
Center for High Performance Computing
155 S. 1452 E. Rm 405
Salt Lake City, Utah 84112 (801)232-6932
====================================

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com