Ceph memory overhead when used with KVM

nick <nick@xxxxxxx> · Thu, 27 Apr 2017 09:56:48 +0200

Hi,
we are running a jewel ceph cluster which serves RBD volumes for our KVM 
virtual machines. Recently we noticed that our KVM machines use a lot more 
memory on the physical host system than what they should use. We collect the 
data with a python script which basically executes 'virsh dommemstat <virtual 
machine name>'. We also verified the results of the script with the memory 
stats of 'cat /proc/<kvm PID>/status' for each virtual machine and the results 
are the same.

Here is an excerpt for one pysical host where all virtual machines are running 
since yesterday (virtual machine names removed):

"""
overhead    actual    percent_overhead  rss
----------  --------  ----------------  --------
423.8 MiB   2.0 GiB                 20  2.4 GiB
460.1 MiB   4.0 GiB                 11  4.4 GiB
471.5 MiB   1.0 GiB                 46  1.5 GiB
472.6 MiB   4.0 GiB                 11  4.5 GiB
681.9 MiB   8.0 GiB                  8  8.7 GiB
156.1 MiB   1.0 GiB                 15  1.2 GiB
278.6 MiB   1.0 GiB                 27  1.3 GiB
290.4 MiB   1.0 GiB                 28  1.3 GiB
291.5 MiB   1.0 GiB                 28  1.3 GiB
0.0 MiB     16.0 GiB                 0  13.7 GiB
294.7 MiB   1.0 GiB                 28  1.3 GiB
135.6 MiB   1.0 GiB                 13  1.1 GiB
0.0 MiB     2.0 GiB                  0  1.4 GiB
1.5 GiB     4.0 GiB                 37  5.5 GiB
"""

We are using the rbd client cache for our virtual machines, but it is set to 
only 128MB per machine. There is also only one rbd volume per virtual machine. 
We have seen more than 200% memory overhead per KVM machine on other physical 
machines. After a live migration of the virtual machine to another host the 
overhead is back to 0 and increasing slowly back to high values.

Here are our ceph.conf settings for the clients:
"""
[client]
rbd cache writethrough until flush = False
rbd cache max dirty = 100663296
rbd cache size = 134217728
rbd cache target dirty = 50331648
"""

We noticed this behavior since we are using the jewel librbd libraries. We did 
not encounter this behavior when using the ceph infernalis librbd version. We 
also do not see this issue when using local storage, instead of ceph.

Some version information of the physical host which runs the KVM machines:
"""
OS: Ubuntu 16.04
kernel: 4.4.0-75-generic
librbd: 10.2.7-1xenial
"""

We did try to flush and invalidate the client cache via the ceph admin socket, 
but this did not change any memory usage values.

Does anyone encounter similar issues or does have an explanation for the high 
memory overhead?

Best Regards
Sebastian
Attachment:
signature.asc

Description: This is a digitally signed message part.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com