I know we noticed high memory usage due to librados in the Ceph multipathd checker [1] -- the order of hundreds of megabytes. That client was probably nearly as trivial as an application can get and I just assumed it was due to large monitor maps being sent to the client for whatever reason. Since we changed course on our RBD iSCSI implementation, unfortunately the investigation into this high memory usage fell by the wayside. [1] http://git.opensvc.com/gitweb.cgi?p=multipath-tools/.git;a=blob;f=libmultipath/checkers/rbd.c;h=9ea0572f2b5bd41b80bf2601137b74f92bdc7278;hb=HEAD On Thu, Apr 27, 2017 at 5:26 AM, nick <nick@xxxxxxx> wrote: > Hi Christian, > thanks for your answer. > The highest value I can see for a local storage VM in our infrastructure is a > memory overhead of 39%. This is big, but the majority (>90%) of our local > storage VMs are using less than 10% memory overhead. > For ceph storage based VMs this looks quite different. The highest value I can > see currently is 244% memory overhead. So that specific allocated 3GB memory VM > is using now 10.3 GB RSS memory on the physical host. This is a really huge > value. In general I can see that the majority of the ceph based VMs has more > than 60% memory overhead. > > Maybe this is also a bug related to qemu+librbd. It would be just nice to know > if other people are seeing those high values as well. > > Cheers > Sebastian > > On Thursday, April 27, 2017 06:10:48 PM you wrote: >> Hello, >> >> Definitely seeing about 20% overhead with Hammer as well, so not version >> specific from where I'm standing. >> >> While non-RBD storage VMs by and large tend to be closer the specified >> size, I've seen them exceed things by few % at times, too. >> For example a 4317968KB RSS one that ought to be 4GB. >> >> Regards, >> >> Christian >> >> On Thu, 27 Apr 2017 09:56:48 +0200 nick wrote: >> > Hi, >> > we are running a jewel ceph cluster which serves RBD volumes for our KVM >> > virtual machines. Recently we noticed that our KVM machines use a lot more >> > memory on the physical host system than what they should use. We collect >> > the data with a python script which basically executes 'virsh dommemstat >> > <virtual machine name>'. We also verified the results of the script with >> > the memory stats of 'cat /proc/<kvm PID>/status' for each virtual machine >> > and the results are the same. >> > >> > Here is an excerpt for one pysical host where all virtual machines are >> > running since yesterday (virtual machine names removed): >> > >> > """ >> > overhead actual percent_overhead rss >> > ---------- -------- ---------------- -------- >> > 423.8 MiB 2.0 GiB 20 2.4 GiB >> > 460.1 MiB 4.0 GiB 11 4.4 GiB >> > 471.5 MiB 1.0 GiB 46 1.5 GiB >> > 472.6 MiB 4.0 GiB 11 4.5 GiB >> > 681.9 MiB 8.0 GiB 8 8.7 GiB >> > 156.1 MiB 1.0 GiB 15 1.2 GiB >> > 278.6 MiB 1.0 GiB 27 1.3 GiB >> > 290.4 MiB 1.0 GiB 28 1.3 GiB >> > 291.5 MiB 1.0 GiB 28 1.3 GiB >> > 0.0 MiB 16.0 GiB 0 13.7 GiB >> > 294.7 MiB 1.0 GiB 28 1.3 GiB >> > 135.6 MiB 1.0 GiB 13 1.1 GiB >> > 0.0 MiB 2.0 GiB 0 1.4 GiB >> > 1.5 GiB 4.0 GiB 37 5.5 GiB >> > """ >> > >> > We are using the rbd client cache for our virtual machines, but it is set >> > to only 128MB per machine. There is also only one rbd volume per virtual >> > machine. We have seen more than 200% memory overhead per KVM machine on >> > other physical machines. After a live migration of the virtual machine to >> > another host the overhead is back to 0 and increasing slowly back to high >> > values. >> > >> > Here are our ceph.conf settings for the clients: >> > """ >> > [client] >> > rbd cache writethrough until flush = False >> > rbd cache max dirty = 100663296 >> > rbd cache size = 134217728 >> > rbd cache target dirty = 50331648 >> > """ >> > >> > We noticed this behavior since we are using the jewel librbd libraries. We >> > did not encounter this behavior when using the ceph infernalis librbd >> > version. We also do not see this issue when using local storage, instead >> > of ceph. >> > >> > Some version information of the physical host which runs the KVM machines: >> > """ >> > OS: Ubuntu 16.04 >> > kernel: 4.4.0-75-generic >> > librbd: 10.2.7-1xenial >> > """ >> > >> > We did try to flush and invalidate the client cache via the ceph admin >> > socket, but this did not change any memory usage values. >> > >> > Does anyone encounter similar issues or does have an explanation for the >> > high memory overhead? >> > >> > Best Regards >> > Sebastian > > -- > Sebastian Nickel > Nine Internet Solutions AG, Albisriederstr. 243a, CH-8047 Zuerich > Tel +41 44 637 40 00 | Support +41 44 637 40 40 | www.nine.ch > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Jason _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com