Re: Ceph memory overhead when used with KVM

Christian Balzer <chibi@xxxxxxx> · Thu, 27 Apr 2017 18:10:48 +0900

Hello,

Definitely seeing about 20% overhead with Hammer as well, so not version
specific from where I'm standing.

While non-RBD storage VMs by and large tend to be closer the specified
size, I've seen them exceed things by few % at times, too. 
For example a 4317968KB RSS one that ought to be 4GB. 

Regards,

Christian

On Thu, 27 Apr 2017 09:56:48 +0200 nick wrote:

> Hi,
> we are running a jewel ceph cluster which serves RBD volumes for our KVM 
> virtual machines. Recently we noticed that our KVM machines use a lot more 
> memory on the physical host system than what they should use. We collect the 
> data with a python script which basically executes 'virsh dommemstat <virtual 
> machine name>'. We also verified the results of the script with the memory 
> stats of 'cat /proc/<kvm PID>/status' for each virtual machine and the results 
> are the same.
> 
> Here is an excerpt for one pysical host where all virtual machines are running 
> since yesterday (virtual machine names removed):
> 
> """
> overhead    actual    percent_overhead  rss
> ----------  --------  ----------------  --------
> 423.8 MiB   2.0 GiB                 20  2.4 GiB
> 460.1 MiB   4.0 GiB                 11  4.4 GiB
> 471.5 MiB   1.0 GiB                 46  1.5 GiB
> 472.6 MiB   4.0 GiB                 11  4.5 GiB
> 681.9 MiB   8.0 GiB                  8  8.7 GiB
> 156.1 MiB   1.0 GiB                 15  1.2 GiB
> 278.6 MiB   1.0 GiB                 27  1.3 GiB
> 290.4 MiB   1.0 GiB                 28  1.3 GiB
> 291.5 MiB   1.0 GiB                 28  1.3 GiB
> 0.0 MiB     16.0 GiB                 0  13.7 GiB
> 294.7 MiB   1.0 GiB                 28  1.3 GiB
> 135.6 MiB   1.0 GiB                 13  1.1 GiB
> 0.0 MiB     2.0 GiB                  0  1.4 GiB
> 1.5 GiB     4.0 GiB                 37  5.5 GiB
> """
>  
> We are using the rbd client cache for our virtual machines, but it is set to 
> only 128MB per machine. There is also only one rbd volume per virtual machine. 
> We have seen more than 200% memory overhead per KVM machine on other physical 
> machines. After a live migration of the virtual machine to another host the 
> overhead is back to 0 and increasing slowly back to high values.
> 
> Here are our ceph.conf settings for the clients:
> """
> [client]
> rbd cache writethrough until flush = False
> rbd cache max dirty = 100663296
> rbd cache size = 134217728
> rbd cache target dirty = 50331648
> """
> 
> We noticed this behavior since we are using the jewel librbd libraries. We did 
> not encounter this behavior when using the ceph infernalis librbd version. We 
> also do not see this issue when using local storage, instead of ceph.
> 
> Some version information of the physical host which runs the KVM machines:
> """
> OS: Ubuntu 16.04
> kernel: 4.4.0-75-generic
> librbd: 10.2.7-1xenial
> """
> 
> We did try to flush and invalidate the client cache via the ceph admin socket, 
> but this did not change any memory usage values.
> 
> Does anyone encounter similar issues or does have an explanation for the high 
> memory overhead?
> 
> Best Regards
> Sebastian

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com