Re: Ceph memory overhead when used with KVM

Jason Dillaman <jdillama@xxxxxxxxxx> · Thu, 27 Apr 2017 10:08:36 -0400

I know we noticed high memory usage due to librados in the Ceph
multipathd checker [1] -- the order of hundreds of megabytes. That
client was probably nearly as trivial as an application can get and I
just assumed it was due to large monitor maps being sent to the client
for whatever reason. Since we changed course on our RBD iSCSI
implementation, unfortunately the investigation into this high memory
usage fell by the wayside.

[1] http://git.opensvc.com/gitweb.cgi?p=multipath-tools/.git;a=blob;f=libmultipath/checkers/rbd.c;h=9ea0572f2b5bd41b80bf2601137b74f92bdc7278;hb=HEAD

On Thu, Apr 27, 2017 at 5:26 AM, nick <nick@xxxxxxx> wrote:
> Hi Christian,
> thanks for your answer.
> The highest value I can see for a local storage VM in our infrastructure is a
> memory overhead of 39%. This is big, but the majority (>90%) of our local
> storage VMs are using less than 10% memory overhead.
> For ceph storage based VMs this looks quite different. The highest value I can
> see currently is 244% memory overhead. So that specific allocated 3GB memory VM
> is using now 10.3 GB RSS memory on the physical host. This is a really huge
> value. In general I can see that the majority of the ceph based VMs has more
> than 60% memory overhead.
>
> Maybe this is also a bug related to qemu+librbd. It would be just nice to know
> if other people are seeing those high values as well.
>
> Cheers
> Sebastian
>
> On Thursday, April 27, 2017 06:10:48 PM you wrote:
>> Hello,
>>
>> Definitely seeing about 20% overhead with Hammer as well, so not version
>> specific from where I'm standing.
>>
>> While non-RBD storage VMs by and large tend to be closer the specified
>> size, I've seen them exceed things by few % at times, too.
>> For example a 4317968KB RSS one that ought to be 4GB.
>>
>> Regards,
>>
>> Christian
>>
>> On Thu, 27 Apr 2017 09:56:48 +0200 nick wrote:
>> > Hi,
>> > we are running a jewel ceph cluster which serves RBD volumes for our KVM
>> > virtual machines. Recently we noticed that our KVM machines use a lot more
>> > memory on the physical host system than what they should use. We collect
>> > the data with a python script which basically executes 'virsh dommemstat
>> > <virtual machine name>'. We also verified the results of the script with
>> > the memory stats of 'cat /proc/<kvm PID>/status' for each virtual machine
>> > and the results are the same.
>> >
>> > Here is an excerpt for one pysical host where all virtual machines are
>> > running since yesterday (virtual machine names removed):
>> >
>> > """
>> > overhead    actual    percent_overhead  rss
>> > ----------  --------  ----------------  --------
>> > 423.8 MiB   2.0 GiB                 20  2.4 GiB
>> > 460.1 MiB   4.0 GiB                 11  4.4 GiB
>> > 471.5 MiB   1.0 GiB                 46  1.5 GiB
>> > 472.6 MiB   4.0 GiB                 11  4.5 GiB
>> > 681.9 MiB   8.0 GiB                  8  8.7 GiB
>> > 156.1 MiB   1.0 GiB                 15  1.2 GiB
>> > 278.6 MiB   1.0 GiB                 27  1.3 GiB
>> > 290.4 MiB   1.0 GiB                 28  1.3 GiB
>> > 291.5 MiB   1.0 GiB                 28  1.3 GiB
>> > 0.0 MiB     16.0 GiB                 0  13.7 GiB
>> > 294.7 MiB   1.0 GiB                 28  1.3 GiB
>> > 135.6 MiB   1.0 GiB                 13  1.1 GiB
>> > 0.0 MiB     2.0 GiB                  0  1.4 GiB
>> > 1.5 GiB     4.0 GiB                 37  5.5 GiB
>> > """
>> >
>> > We are using the rbd client cache for our virtual machines, but it is set
>> > to only 128MB per machine. There is also only one rbd volume per virtual
>> > machine. We have seen more than 200% memory overhead per KVM machine on
>> > other physical machines. After a live migration of the virtual machine to
>> > another host the overhead is back to 0 and increasing slowly back to high
>> > values.
>> >
>> > Here are our ceph.conf settings for the clients:
>> > """
>> > [client]
>> > rbd cache writethrough until flush = False
>> > rbd cache max dirty = 100663296
>> > rbd cache size = 134217728
>> > rbd cache target dirty = 50331648
>> > """
>> >
>> > We noticed this behavior since we are using the jewel librbd libraries. We
>> > did not encounter this behavior when using the ceph infernalis librbd
>> > version. We also do not see this issue when using local storage, instead
>> > of ceph.
>> >
>> > Some version information of the physical host which runs the KVM machines:
>> > """
>> > OS: Ubuntu 16.04
>> > kernel: 4.4.0-75-generic
>> > librbd: 10.2.7-1xenial
>> > """
>> >
>> > We did try to flush and invalidate the client cache via the ceph admin
>> > socket, but this did not change any memory usage values.
>> >
>> > Does anyone encounter similar issues or does have an explanation for the
>> > high memory overhead?
>> >
>> > Best Regards
>> > Sebastian
>
> --
> Sebastian Nickel
> Nine Internet Solutions AG, Albisriederstr. 243a, CH-8047 Zuerich
> Tel +41 44 637 40 00 | Support +41 44 637 40 40 | www.nine.ch
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com