Hi, I used one of the fio example files and changed it a bit: """ # This job file tries to mimic the Intel IOMeter File Server Access Pattern [global] description=Emulation of Intel IOmeter File Server Access Pattern randrepeat=0 filename=/root/test.dat # IOMeter defines the server loads as the following: # iodepth=1 Linear # iodepth=4 Very Light # iodepth=8 Light # iodepth=64 Moderate # iodepth=256 Heavy iodepth=8 size=80g direct=0 ioengine=libaio [iometer] stonewall bs=4M rw=randrw [iometer_just_write] stonewall bs=4M rw=write [iometer_just_read] stonewall bs=4M rw=read """ Then let it run: $> while true; do fio stress.fio; rm /root/test.dat; done I had this running over a weekend. Cheers Sebastian On Tuesday, May 02, 2017 02:51:06 PM Jason Dillaman wrote: > Can you share the fio job file that you utilized so I can attempt to > repeat locally? > > On Tue, May 2, 2017 at 2:51 AM, nick <nick@xxxxxxx> wrote: > > Hi Jason, > > thanks for your feedback. I did now some tests over the weekend to verify > > the memory overhead. > > I was using qemu 2.8 (taken from the Ubuntu Cloud Archive) with librbd > > 10.2.7 on Ubuntu 16.04 hosts. I suspected the ceph rbd cache to be the > > cause of the overhead so I just generated a lot of IO with the help of > > fio in the VMs (with a datasize of 80GB) . All VMs had 3GB of memory. I > > had to run fio multiple times, before reaching high RSS values. > > I also noticed that when using larger blocksizes during writes (like 4M) > > the memory overhead in the KVM process increased faster. > > I ran several fio tests (one after another) and the results are: > > > > KVM with writeback RBD cache: max. 85% memory overhead (2.5 GB overhead) > > KVM with writethrough RBD cache: max. 50% memory overhead > > KVM without RBD caching: less than 10% overhead all the time > > KVM with local storage (logical volume used): 8% overhead all the time > > > > I did not reach those >200% memory overhead results that we see on our > > live > > cluster, but those virtual machines have a way longer uptime as well. > > > > I also tried to reduce the RSS memory value with cache dropping on the > > physical host and in the VM. Both did not lead to any change. A reboot of > > the VM also does not change anything (reboot in the VM, not a new KVM > > process). The only way to reduce the RSS memory value is a live migration > > so far. Might this be a bug? The memory overhead sounds a bit too much > > for me. > > > > Best Regards > > Sebastian > > > > On Thursday, April 27, 2017 10:08:36 AM you wrote: > >> I know we noticed high memory usage due to librados in the Ceph > >> multipathd checker [1] -- the order of hundreds of megabytes. That > >> client was probably nearly as trivial as an application can get and I > >> just assumed it was due to large monitor maps being sent to the client > >> for whatever reason. Since we changed course on our RBD iSCSI > >> implementation, unfortunately the investigation into this high memory > >> usage fell by the wayside. > >> > >> [1] > >> http://git.opensvc.com/gitweb.cgi?p=multipath-tools/.git;a=blob;f=libmult > >> ip > >> ath/checkers/rbd.c;h=9ea0572f2b5bd41b80bf2601137b74f92bdc7278;hb=HEAD > >> > >> On Thu, Apr 27, 2017 at 5:26 AM, nick <nick@xxxxxxx> wrote: > >> > Hi Christian, > >> > thanks for your answer. > >> > The highest value I can see for a local storage VM in our > >> > infrastructure > >> > is a memory overhead of 39%. This is big, but the majority (>90%) of > >> > our > >> > local storage VMs are using less than 10% memory overhead. > >> > For ceph storage based VMs this looks quite different. The highest > >> > value I > >> > can see currently is 244% memory overhead. So that specific allocated > >> > 3GB > >> > memory VM is using now 10.3 GB RSS memory on the physical host. This is > >> > a > >> > really huge value. In general I can see that the majority of the ceph > >> > based VMs has more than 60% memory overhead. > >> > > >> > Maybe this is also a bug related to qemu+librbd. It would be just nice > >> > to > >> > know if other people are seeing those high values as well. > >> > > >> > Cheers > >> > Sebastian > >> > > >> > On Thursday, April 27, 2017 06:10:48 PM you wrote: > >> >> Hello, > >> >> > >> >> Definitely seeing about 20% overhead with Hammer as well, so not > >> >> version > >> >> specific from where I'm standing. > >> >> > >> >> While non-RBD storage VMs by and large tend to be closer the specified > >> >> size, I've seen them exceed things by few % at times, too. > >> >> For example a 4317968KB RSS one that ought to be 4GB. > >> >> > >> >> Regards, > >> >> > >> >> Christian > >> >> > >> >> On Thu, 27 Apr 2017 09:56:48 +0200 nick wrote: > >> >> > Hi, > >> >> > we are running a jewel ceph cluster which serves RBD volumes for our > >> >> > KVM > >> >> > virtual machines. Recently we noticed that our KVM machines use a > >> >> > lot > >> >> > more > >> >> > memory on the physical host system than what they should use. We > >> >> > collect > >> >> > the data with a python script which basically executes 'virsh > >> >> > dommemstat > >> >> > <virtual machine name>'. We also verified the results of the script > >> >> > with > >> >> > the memory stats of 'cat /proc/<kvm PID>/status' for each virtual > >> >> > machine > >> >> > and the results are the same. > >> >> > > >> >> > Here is an excerpt for one pysical host where all virtual machines > >> >> > are > >> >> > running since yesterday (virtual machine names removed): > >> >> > > >> >> > """ > >> >> > overhead actual percent_overhead rss > >> >> > ---------- -------- ---------------- -------- > >> >> > 423.8 MiB 2.0 GiB 20 2.4 GiB > >> >> > 460.1 MiB 4.0 GiB 11 4.4 GiB > >> >> > 471.5 MiB 1.0 GiB 46 1.5 GiB > >> >> > 472.6 MiB 4.0 GiB 11 4.5 GiB > >> >> > 681.9 MiB 8.0 GiB 8 8.7 GiB > >> >> > 156.1 MiB 1.0 GiB 15 1.2 GiB > >> >> > 278.6 MiB 1.0 GiB 27 1.3 GiB > >> >> > 290.4 MiB 1.0 GiB 28 1.3 GiB > >> >> > 291.5 MiB 1.0 GiB 28 1.3 GiB > >> >> > 0.0 MiB 16.0 GiB 0 13.7 GiB > >> >> > 294.7 MiB 1.0 GiB 28 1.3 GiB > >> >> > 135.6 MiB 1.0 GiB 13 1.1 GiB > >> >> > 0.0 MiB 2.0 GiB 0 1.4 GiB > >> >> > 1.5 GiB 4.0 GiB 37 5.5 GiB > >> >> > """ > >> >> > > >> >> > We are using the rbd client cache for our virtual machines, but it > >> >> > is > >> >> > set > >> >> > to only 128MB per machine. There is also only one rbd volume per > >> >> > virtual > >> >> > machine. We have seen more than 200% memory overhead per KVM machine > >> >> > on > >> >> > other physical machines. After a live migration of the virtual > >> >> > machine > >> >> > to > >> >> > another host the overhead is back to 0 and increasing slowly back to > >> >> > high > >> >> > values. > >> >> > > >> >> > Here are our ceph.conf settings for the clients: > >> >> > """ > >> >> > [client] > >> >> > rbd cache writethrough until flush = False > >> >> > rbd cache max dirty = 100663296 > >> >> > rbd cache size = 134217728 > >> >> > rbd cache target dirty = 50331648 > >> >> > """ > >> >> > > >> >> > We noticed this behavior since we are using the jewel librbd > >> >> > libraries. > >> >> > We > >> >> > did not encounter this behavior when using the ceph infernalis > >> >> > librbd > >> >> > version. We also do not see this issue when using local storage, > >> >> > instead > >> >> > of ceph. > >> >> > > >> >> > Some version information of the physical host which runs the KVM > >> >> > machines: > >> >> > """ > >> >> > OS: Ubuntu 16.04 > >> >> > kernel: 4.4.0-75-generic > >> >> > librbd: 10.2.7-1xenial > >> >> > """ > >> >> > > >> >> > We did try to flush and invalidate the client cache via the ceph > >> >> > admin > >> >> > socket, but this did not change any memory usage values. > >> >> > > >> >> > Does anyone encounter similar issues or does have an explanation for > >> >> > the > >> >> > high memory overhead? > >> >> > > >> >> > Best Regards > >> >> > Sebastian > >> > > >> > -- > >> > Sebastian Nickel > >> > Nine Internet Solutions AG, Albisriederstr. 243a, CH-8047 Zuerich > >> > Tel +41 44 637 40 00 | Support +41 44 637 40 40 | www.nine.ch > >> > _______________________________________________ > >> > ceph-users mailing list > >> > ceph-users@xxxxxxxxxxxxxx > >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- > > Sebastian Nickel > > Nine Internet Solutions AG, Albisriederstr. 243a, CH-8047 Zuerich > > Tel +41 44 637 40 00 | Support +41 44 637 40 40 | www.nine.ch -- Sebastian Nickel Nine Internet Solutions AG, Albisriederstr. 243a, CH-8047 Zuerich Tel +41 44 637 40 00 | Support +41 44 637 40 40 | www.nine.ch
Attachment:
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com