On Fri, Feb 18, 2011 at 11:31:37AM -0500, Vivek Goyal wrote: > On Fri, Feb 18, 2011 at 03:42:45PM +0100, Dominik Klein wrote: > > Hi Vivek > > > > I don't know whether you follow the libvirt list, I assume you don't. So > > I thought I'd forward you an E-Mail involving the blkio controller and a > > terrible situation arising from using it (maybe in a wrong way). > > > > I'd truely appreciate it if you read it and commented on it. Maybe I did > > something wrong, but maybe also I found a bug in some way. > > Hi Dominik, > > Thanks for forwarding me this mail. Yes, I am not on libvir-list. I have > just now subscribed. > > Few questions inline. > > > -------- Original Message -------- > > Subject: Re: [PATCH 0/6 v3] Add blkio cgroup support > > Date: Fri, 18 Feb 2011 14:42:51 +0100 > > From: Dominik Klein <dk@xxxxxxxxxxxxxxxx> > > To: libvir-list@xxxxxxxxxx > > > > Hi > > > > back with some testing results. > > > > >> how about the start Guest with option "cache=none" to bypass pagecache? > > >> This should help i think. > > > > > > I will read up on where to set that and give it a try. Thanks for the hint. > > > > So here's what I did and found out: > > > > The host system has 2 12 core CPUs and 128 GB of Ram. > > > > I have 8 test VMs named kernel1 to kernel8. Each VM has 4 VCPUs, 2 GB of > > RAm and one disk, which is an lv on the host. Cache mode is "none": > > So you have only one root SATA disk and setup a linear logical volume on > that? I not, can you give more info about the storage configuration? > > - I am assuming you are using CFQ on your underlying physical disk. > > - What kernel version are you testing with. > > - Cache=none mode is good which should make all the IO O_DIRECT on host > and should show up as SYNC IO on CFQ without losing io context info. > The onlly probelm is intermediate dm layer and if it is changing the > io context somehow. I am not sure at this point of time. > > - Is it possible to capture 10-15 second blktrace on your underlying > physical device. That should give me some idea what's happening. > > - Can you also try setting /sys/block/<disk>/queue/iosched/group_isolation=1 > on your underlying physical device where CFQ is running and see if it makes > any difference. Dominik, Apart from setting group_isolation=1, I would also recommend to do some tests on READS also. Service differentiation is much more visible there. Why? Because In case of writes I am seeing that there are extended periods where ther is no IO on underlying device from higher weight virtual machine. I am not sure what that virtual machine is doing for that duration but that's what blktrace shows. First I ran READS. Two partitions exported to two virtual machines. I started, time dd if=/mnt/vdb/testfile of=/dev/zero and as soon as it finished in first virtual machine, I stopped second virtual machine job also (manually, there could be better test script or use of fio tool which allows to run timed tests). [vm1 ~]# time dd if=/mnt/vdb/testfile of=/dev/zero 3072000+0 records in 3072000+0 records out 1572864000 bytes (1.6 GB) copied, 12.35 s, 127 MB/s real 0m12.503s user 0m0.527s sys 0m2.318s [vm2 ~]# time dd if=/mnt/vdb/testfile of=/dev/zero 420853+0 records in 420852+0 records out 215476224 bytes (215 MB) copied, 12.331 s, 17.5 MB/s real 0m12.342s user 0m0.082s sys 0m0.307s Here in the duration of 12 seconds, first VM did 1.6GB of READS (weight 1000) and second VM did 215MB of READS (weight 100). Then, I did some tests on WRITES and after setting group isolation with two virtual machines following are the results. [vm1 ~]# time dd if=/dev/zero of=/mnt/vdb/testfile bs=1M count=1500 1500+0 records in 1500+0 records out 1572864000 bytes (1.6 GB) copied, 6.47411 s, 243 MB/s real 0m6.711s user 0m0.002s sys 0m2.233s [vm2 ~]# time dd if=/dev/zero of=/mnt/vdb/testfile bs=1M count=1500 388+0 records in 388+0 records out 406847488 bytes (407 MB) copied, 6.68171 s, 60.9 MB/s real 0m6.739s user 0m0.002s sys 0m0.697s First machine wrote 1.6 GB while second machine wrote 400MB. And some of it could be lying in second virtual machine's cache and never made it do disk. So this is significant service differentiation I would say. Thanks Vivek -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list