On 02/21/2011 09:19 AM, Dominik Klein wrote: >>> - Is it possible to capture 10-15 second blktrace on your underlying >>> physical device. That should give me some idea what's happening. >> >> Will do, read on. > > Just realized I missed this one ... Had better done it right away. > > So here goes. > > Setup as in first email. 8 Machines, 2 important, 6 not important ones > with a throttle of ~10M. group_isolation=1. Each vm dd'ing zeroes. > > blktrace -d /dev/sdb -w 30 > === sdb === > CPU 0: 4769 events, 224 KiB data > CPU 1: 28079 events, 1317 KiB data > CPU 2: 1179 events, 56 KiB data > CPU 3: 5529 events, 260 KiB data > CPU 4: 295 events, 14 KiB data > CPU 5: 649 events, 31 KiB data > CPU 6: 185 events, 9 KiB data > CPU 7: 180 events, 9 KiB data > CPU 8: 17 events, 1 KiB data > CPU 9: 12 events, 1 KiB data > CPU 10: 6 events, 1 KiB data > CPU 11: 55 events, 3 KiB data > CPU 12: 28005 events, 1313 KiB data > CPU 13: 1542 events, 73 KiB data > CPU 14: 4814 events, 226 KiB data > CPU 15: 389 events, 19 KiB data > CPU 16: 1545 events, 73 KiB data > CPU 17: 119 events, 6 KiB data > CPU 18: 3019 events, 142 KiB data > CPU 19: 62 events, 3 KiB data > CPU 20: 800 events, 38 KiB data > CPU 21: 17 events, 1 KiB data > CPU 22: 243 events, 12 KiB data > CPU 23: 1 events, 1 KiB data > Total: 81511 events (dropped 0), 3822 KiB data > > Very constant 296 blocked processes in vmstat during this run. But... > apparently no data is written at all (see "bo" column). > > vmstat 2 > procs -----------memory---------- ---swap-- -----io---- -system-- > ----cpu---- > r b swpd free buff cache si so bi bo in cs us sy > id wa > 0 296 0 125254224 21432 142016 0 0 16 633 181 331 0 > 0 93 7 > 0 296 0 125253728 21432 142016 0 0 0 0 17115 33794 > 0 0 25 75 > 0 296 0 125254112 21432 142016 0 0 0 0 17084 33721 > 0 0 25 74 > 1 296 0 125254352 21440 142012 0 0 0 18 17047 33736 > 0 0 25 75 > 0 296 0 125304224 21440 131060 0 0 0 0 17630 33989 > 0 1 23 76 > 1 296 0 125306496 21440 130260 0 0 0 0 16810 33401 > 0 0 20 80 > 4 296 0 125307208 21440 129856 0 0 0 0 17169 33744 > 0 0 26 74 > 0 296 0 125307496 21448 129508 0 0 0 14 17105 33650 > 0 0 36 64 > 0 296 0 125307712 21452 129672 0 0 2 1340 17117 33674 > 0 0 22 78 > 1 296 0 125307752 21452 129520 0 0 0 0 16875 33438 > 0 0 29 70 > 1 296 0 125307776 21452 129520 0 0 0 0 16959 33560 > 0 0 21 79 > 1 296 0 125307792 21460 129520 0 0 0 12 16700 33239 > 0 0 15 85 > 1 296 0 125307808 21460 129520 0 0 0 0 16750 33274 > 0 0 25 74 > 1 296 0 125307808 21460 129520 0 0 0 0 17020 33601 > 0 0 26 74 > 1 296 0 125308272 21460 129520 0 0 0 0 17080 33616 > 0 0 20 80 > 1 296 0 125308408 21460 129520 0 0 0 0 16428 32972 > 0 0 42 58 > 1 296 0 125308016 21460 129524 0 0 0 0 17021 33624 > 0 0 22 77 While we're on that ... It is impossible for me now to recover from this state without pulling the power plug. On the VMs console I see messages like INFO: task (kjournald|flush-254|dd|rs:main|...) blocked for more than 120 seconds. While the ssh sessions through which the dd was started seem intact (pressing enter gives a new line), it is impossible to cancel the dd command. Logging in on the VMs console also is impossible. Opening a new ssh session to the host also does not work. Killing the qemu-kvm processes from a session opened earlier leaves zomby processes. Moving the VMs back to the root cgroup makes no difference either. Regards Dominik -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list