On Mon, Feb 21, 2011 at 03:36:14PM +0800, Gui Jianfeng wrote: > Dominik, > > Would you try "oflag=direct" when you do tests in Guests. And make sure > /sys/block/xxx/queue/iosched/group_isolation is set to 1. oflag=direct in guest might be good for testing and understanding the problem, but in practice we will not have a control over what a user is running inside guest. The only control we will have is to use cache=none for guest and then control any traffic coming out of guest. Thanks Vivek > > I guess with such setting, your tests should goes well. > > Thanks, > Gui > > Vivek Goyal wrote: > > On Fri, Feb 18, 2011 at 03:42:45PM +0100, Dominik Klein wrote: > >> Hi Vivek > >> > >> I don't know whether you follow the libvirt list, I assume you don't. So > >> I thought I'd forward you an E-Mail involving the blkio controller and a > >> terrible situation arising from using it (maybe in a wrong way). > >> > >> I'd truely appreciate it if you read it and commented on it. Maybe I did > >> something wrong, but maybe also I found a bug in some way. > > > > Hi Dominik, > > > > Thanks for forwarding me this mail. Yes, I am not on libvir-list. I have > > just now subscribed. > > > > Few questions inline. > > > >> -------- Original Message -------- > >> Subject: Re: [PATCH 0/6 v3] Add blkio cgroup support > >> Date: Fri, 18 Feb 2011 14:42:51 +0100 > >> From: Dominik Klein <dk@xxxxxxxxxxxxxxxx> > >> To: libvir-list@xxxxxxxxxx > >> > >> Hi > >> > >> back with some testing results. > >> > >>>> how about the start Guest with option "cache=none" to bypass pagecache? > >>>> This should help i think. > >>> I will read up on where to set that and give it a try. Thanks for the hint. > >> So here's what I did and found out: > >> > >> The host system has 2 12 core CPUs and 128 GB of Ram. > >> > >> I have 8 test VMs named kernel1 to kernel8. Each VM has 4 VCPUs, 2 GB of > >> RAm and one disk, which is an lv on the host. Cache mode is "none": > > > > So you have only one root SATA disk and setup a linear logical volume on > > that? I not, can you give more info about the storage configuration? > > > > - I am assuming you are using CFQ on your underlying physical disk. > > > > - What kernel version are you testing with. > > > > - Cache=none mode is good which should make all the IO O_DIRECT on host > > and should show up as SYNC IO on CFQ without losing io context info. > > The onlly probelm is intermediate dm layer and if it is changing the > > io context somehow. I am not sure at this point of time. > > > > - Is it possible to capture 10-15 second blktrace on your underlying > > physical device. That should give me some idea what's happening. > > > > - Can you also try setting /sys/block/<disk>/queue/iosched/group_isolation=1 > > on your underlying physical device where CFQ is running and see if it makes > > any difference. > > > >> for vm in kernel1 kernel2 kernel3 kernel4 kernel5 kernel6 kernel7 > >> kernel8; do virsh dumpxml $vm|grep cache; done > >> <driver name='qemu' type='raw' cache='none'/> > >> <driver name='qemu' type='raw' cache='none'/> > >> <driver name='qemu' type='raw' cache='none'/> > >> <driver name='qemu' type='raw' cache='none'/> > >> <driver name='qemu' type='raw' cache='none'/> > >> <driver name='qemu' type='raw' cache='none'/> > >> <driver name='qemu' type='raw' cache='none'/> > >> <driver name='qemu' type='raw' cache='none'/> > >> > >> My goal is to give more I/O time to kernel1 and kernel2 than to the rest > >> of the VMs. > >> > >> mount -t cgroup -o blkio none /mnt > >> cd /mnt > >> mkdir important > >> mkdir notimportant > >> > >> echo 1000 > important/blkio.weight > >> echo 100 > notimportant/blkio.weight > >> for vm in kernel3 kernel4 kernel5 kernel6 kernel7 kernel8; do > >> cd /proc/$(pgrep -f "qemu-kvm.*$vm")/task > >> for task in *; do > >> /bin/echo $task > /mnt/notimportant/tasks > >> done > >> done > >> > >> for vm in kernel1 kernel2; do > >> cd /proc/$(pgrep -f "qemu-kvm.*$vm")/task > >> for task in *; do > >> /bin/echo $task > /mnt/important/tasks > >> done > >> done > >> > >> Then I used cssh to connect to all 8 VMs and execute > >> dd if=/dev/zero of=testfile bs=1M count=1500 > >> in all VMs simultaneously. > >> > >> Results are: > >> kernel1: 47.5593 s, 33.1 MB/s > >> kernel2: 60.1464 s, 26.2 MB/s > >> kernel3: 74.204 s, 21.2 MB/s > >> kernel4: 77.0759 s, 20.4 MB/s > >> kernel5: 65.6309 s, 24.0 MB/s > >> kernel6: 81.1402 s, 19.4 MB/s > >> kernel7: 70.3881 s, 22.3 MB/s > >> kernel8: 77.4475 s, 20.3 MB/s > >> > >> Results vary a little bit from run to run, but it is nothing > >> spectacular, as weights of 1000 vs. 100 would suggest. > >> > >> So I went and tried to throttle I/O of kernel3-8 to 10MB/s instead of > >> weighing I/O. First I rebooted everything so that no old configuration > >> of cgroup was left in place and then setup everything except the 100 and > >> 1000 weight configuration. > >> > >> quote from blkio.txt: > >> ------------ > >> - blkio.throttle.write_bps_device > >> - Specifies upper limit on WRITE rate to the device. IO rate is > >> specified in bytes per second. Rules are per deivce. Following is > >> the format. > >> > >> echo "<major>:<minor> <rate_bytes_per_second>" > > >> /cgrp/blkio.write_bps_device > >> ------------- > >> > >> for vm in kernel1 kernel2 kernel3 kernel4 kernel5 kernel6 kernel7 > >> kernel8; do ls -lH /dev/vdisks/$vm; done > >> brw-rw---- 1 root root 254, 23 Feb 18 13:45 /dev/vdisks/kernel1 > >> brw-rw---- 1 root root 254, 24 Feb 18 13:45 /dev/vdisks/kernel2 > >> brw-rw---- 1 root root 254, 25 Feb 18 13:45 /dev/vdisks/kernel3 > >> brw-rw---- 1 root root 254, 26 Feb 18 13:45 /dev/vdisks/kernel4 > >> brw-rw---- 1 root root 254, 27 Feb 18 13:45 /dev/vdisks/kernel5 > >> brw-rw---- 1 root root 254, 28 Feb 18 13:45 /dev/vdisks/kernel6 > >> brw-rw---- 1 root root 254, 29 Feb 18 13:45 /dev/vdisks/kernel7 > >> brw-rw---- 1 root root 254, 30 Feb 18 13:45 /dev/vdisks/kernel8 > >> > >> /bin/echo 254:25 10000000 > > >> /mnt/notimportant/blkio.throttle.write_bps_device > >> /bin/echo 254:26 10000000 > > >> /mnt/notimportant/blkio.throttle.write_bps_device > >> /bin/echo 254:27 10000000 > > >> /mnt/notimportant/blkio.throttle.write_bps_device > >> /bin/echo 254:28 10000000 > > >> /mnt/notimportant/blkio.throttle.write_bps_device > >> /bin/echo 254:29 10000000 > > >> /mnt/notimportant/blkio.throttle.write_bps_device > >> /bin/echo 254:30 10000000 > > >> /mnt/notimportant/blkio.throttle.write_bps_device > >> /bin/echo 254:30 10000000 > > >> /mnt/notimportant/blkio.throttle.write_bps_device > >> > >> Then I ran the previous test again. This resulted in an ever increasing > >> load (last I checked was ~ 300) on the host system. (This is perfectly > >> reproducible). > >> > >> uptime > >> Fri Feb 18 14:42:17 2011 > >> 14:42:17 up 12 min, 9 users, load average: 286.51, 142.22, 56.71 > > > > Have you run top or something to figure out why load average is shooting > > up. I suspect that because of throttling limit, IO threads have been > > blocked and qemu is forking more IO threads. Can you just run top/ps > > and figure out what's happening. > > > > Again, is it some kind of linear volume group from which you have carved > > out logical volumes for each virtual machine? > > > > For throttling to begin with, can we do a simple test first. That is > > run a single virtual machine, put some throttling limit on logical volume > > and try to do READs. Once READs work, lets test WRITES and check why > > does system load go up. > > > > Thanks > > Vivek > > > > -- > > libvir-list mailing list > > libvir-list@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/libvir-list > > > > -- > Regards > Gui Jianfeng -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list