Dominik, Would you try "oflag=direct" when you do tests in Guests. And make sure /sys/block/xxx/queue/iosched/group_isolation is set to 1. I guess with such setting, your tests should goes well. Thanks, Gui Vivek Goyal wrote: > On Fri, Feb 18, 2011 at 03:42:45PM +0100, Dominik Klein wrote: >> Hi Vivek >> >> I don't know whether you follow the libvirt list, I assume you don't. So >> I thought I'd forward you an E-Mail involving the blkio controller and a >> terrible situation arising from using it (maybe in a wrong way). >> >> I'd truely appreciate it if you read it and commented on it. Maybe I did >> something wrong, but maybe also I found a bug in some way. > > Hi Dominik, > > Thanks for forwarding me this mail. Yes, I am not on libvir-list. I have > just now subscribed. > > Few questions inline. > >> -------- Original Message -------- >> Subject: Re: [PATCH 0/6 v3] Add blkio cgroup support >> Date: Fri, 18 Feb 2011 14:42:51 +0100 >> From: Dominik Klein <dk@xxxxxxxxxxxxxxxx> >> To: libvir-list@xxxxxxxxxx >> >> Hi >> >> back with some testing results. >> >>>> how about the start Guest with option "cache=none" to bypass pagecache? >>>> This should help i think. >>> I will read up on where to set that and give it a try. Thanks for the hint. >> So here's what I did and found out: >> >> The host system has 2 12 core CPUs and 128 GB of Ram. >> >> I have 8 test VMs named kernel1 to kernel8. Each VM has 4 VCPUs, 2 GB of >> RAm and one disk, which is an lv on the host. Cache mode is "none": > > So you have only one root SATA disk and setup a linear logical volume on > that? I not, can you give more info about the storage configuration? > > - I am assuming you are using CFQ on your underlying physical disk. > > - What kernel version are you testing with. > > - Cache=none mode is good which should make all the IO O_DIRECT on host > and should show up as SYNC IO on CFQ without losing io context info. > The onlly probelm is intermediate dm layer and if it is changing the > io context somehow. I am not sure at this point of time. > > - Is it possible to capture 10-15 second blktrace on your underlying > physical device. That should give me some idea what's happening. > > - Can you also try setting /sys/block/<disk>/queue/iosched/group_isolation=1 > on your underlying physical device where CFQ is running and see if it makes > any difference. > >> for vm in kernel1 kernel2 kernel3 kernel4 kernel5 kernel6 kernel7 >> kernel8; do virsh dumpxml $vm|grep cache; done >> <driver name='qemu' type='raw' cache='none'/> >> <driver name='qemu' type='raw' cache='none'/> >> <driver name='qemu' type='raw' cache='none'/> >> <driver name='qemu' type='raw' cache='none'/> >> <driver name='qemu' type='raw' cache='none'/> >> <driver name='qemu' type='raw' cache='none'/> >> <driver name='qemu' type='raw' cache='none'/> >> <driver name='qemu' type='raw' cache='none'/> >> >> My goal is to give more I/O time to kernel1 and kernel2 than to the rest >> of the VMs. >> >> mount -t cgroup -o blkio none /mnt >> cd /mnt >> mkdir important >> mkdir notimportant >> >> echo 1000 > important/blkio.weight >> echo 100 > notimportant/blkio.weight >> for vm in kernel3 kernel4 kernel5 kernel6 kernel7 kernel8; do >> cd /proc/$(pgrep -f "qemu-kvm.*$vm")/task >> for task in *; do >> /bin/echo $task > /mnt/notimportant/tasks >> done >> done >> >> for vm in kernel1 kernel2; do >> cd /proc/$(pgrep -f "qemu-kvm.*$vm")/task >> for task in *; do >> /bin/echo $task > /mnt/important/tasks >> done >> done >> >> Then I used cssh to connect to all 8 VMs and execute >> dd if=/dev/zero of=testfile bs=1M count=1500 >> in all VMs simultaneously. >> >> Results are: >> kernel1: 47.5593 s, 33.1 MB/s >> kernel2: 60.1464 s, 26.2 MB/s >> kernel3: 74.204 s, 21.2 MB/s >> kernel4: 77.0759 s, 20.4 MB/s >> kernel5: 65.6309 s, 24.0 MB/s >> kernel6: 81.1402 s, 19.4 MB/s >> kernel7: 70.3881 s, 22.3 MB/s >> kernel8: 77.4475 s, 20.3 MB/s >> >> Results vary a little bit from run to run, but it is nothing >> spectacular, as weights of 1000 vs. 100 would suggest. >> >> So I went and tried to throttle I/O of kernel3-8 to 10MB/s instead of >> weighing I/O. First I rebooted everything so that no old configuration >> of cgroup was left in place and then setup everything except the 100 and >> 1000 weight configuration. >> >> quote from blkio.txt: >> ------------ >> - blkio.throttle.write_bps_device >> - Specifies upper limit on WRITE rate to the device. IO rate is >> specified in bytes per second. Rules are per deivce. Following is >> the format. >> >> echo "<major>:<minor> <rate_bytes_per_second>" > >> /cgrp/blkio.write_bps_device >> ------------- >> >> for vm in kernel1 kernel2 kernel3 kernel4 kernel5 kernel6 kernel7 >> kernel8; do ls -lH /dev/vdisks/$vm; done >> brw-rw---- 1 root root 254, 23 Feb 18 13:45 /dev/vdisks/kernel1 >> brw-rw---- 1 root root 254, 24 Feb 18 13:45 /dev/vdisks/kernel2 >> brw-rw---- 1 root root 254, 25 Feb 18 13:45 /dev/vdisks/kernel3 >> brw-rw---- 1 root root 254, 26 Feb 18 13:45 /dev/vdisks/kernel4 >> brw-rw---- 1 root root 254, 27 Feb 18 13:45 /dev/vdisks/kernel5 >> brw-rw---- 1 root root 254, 28 Feb 18 13:45 /dev/vdisks/kernel6 >> brw-rw---- 1 root root 254, 29 Feb 18 13:45 /dev/vdisks/kernel7 >> brw-rw---- 1 root root 254, 30 Feb 18 13:45 /dev/vdisks/kernel8 >> >> /bin/echo 254:25 10000000 > >> /mnt/notimportant/blkio.throttle.write_bps_device >> /bin/echo 254:26 10000000 > >> /mnt/notimportant/blkio.throttle.write_bps_device >> /bin/echo 254:27 10000000 > >> /mnt/notimportant/blkio.throttle.write_bps_device >> /bin/echo 254:28 10000000 > >> /mnt/notimportant/blkio.throttle.write_bps_device >> /bin/echo 254:29 10000000 > >> /mnt/notimportant/blkio.throttle.write_bps_device >> /bin/echo 254:30 10000000 > >> /mnt/notimportant/blkio.throttle.write_bps_device >> /bin/echo 254:30 10000000 > >> /mnt/notimportant/blkio.throttle.write_bps_device >> >> Then I ran the previous test again. This resulted in an ever increasing >> load (last I checked was ~ 300) on the host system. (This is perfectly >> reproducible). >> >> uptime >> Fri Feb 18 14:42:17 2011 >> 14:42:17 up 12 min, 9 users, load average: 286.51, 142.22, 56.71 > > Have you run top or something to figure out why load average is shooting > up. I suspect that because of throttling limit, IO threads have been > blocked and qemu is forking more IO threads. Can you just run top/ps > and figure out what's happening. > > Again, is it some kind of linear volume group from which you have carved > out logical volumes for each virtual machine? > > For throttling to begin with, can we do a simple test first. That is > run a single virtual machine, put some throttling limit on logical volume > and try to do READs. Once READs work, lets test WRITES and check why > does system load go up. > > Thanks > Vivek > > -- > libvir-list mailing list > libvir-list@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/libvir-list > -- Regards Gui Jianfeng -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list