On Mon, 2010-08-02 at 21:50 +0100, Stefan Hajnoczi wrote: > On Mon, Aug 2, 2010 at 6:46 PM, Anthony Liguori <anthony@xxxxxxxxxxxxx> wrote: > > On 08/02/2010 12:15 PM, John Leach wrote: > >> > >> Hi, > >> > >> I've come across a problem with read and write disk IO performance when > >> using O_DIRECT from within a kvm guest. With O_DIRECT, reads and writes > >> are much slower with smaller block sizes. Depending on the block size > >> used, I've seen 10 times slower. > >> > >> For example, with an 8k block size, reading directly from /dev/vdb > >> without O_DIRECT I see 750 MB/s, but with O_DIRECT I see 79 MB/s. > >> > >> As a comparison, reading in O_DIRECT mode in 8k blocks directly from the > >> backend device on the host gives 2.3 GB/s. Reading in O_DIRECT mode > >> from a xen guest on the same hardware manages 263 MB/s. > >> > > > > Stefan has a few fixes for this behavior that help a lot. One of them > > (avoiding memset) is already upstream but not in 0.12.x. Anthony, that patch is already applied in the RHEL6 package I'm been testing with - I've just manually confirmed that. Thanks though. > > > > The other two are not done yet but should be on the ML in the next couple > > weeks. They involve using ioeventfd for notification and unlocking the > > block queue lock while doing a kick notification. > > Thanks for mentioning those patches. The ioeventfd patch will be sent > this week, I'm checking that migration works correctly and then need > to check that vhost-net still works. I'll give them a test as soon as I can get hold of them, thanks Stefan! > >> Writing is affected in the same way, and exhibits the same behaviour > >> with O_SYNC too. > >> > >> Watching with vmstat on the host, I see the same number of blocks being > >> read, but about 14 times the number of context switches in O_DIRECT mode > >> (4500 cs vs. 63000 cs) and a little more cpu usage. > >> > >> The device I'm writing to is a device-mapper zero device that generates > >> zeros on read and throws away writes, you can set it up > >> at /dev/mapper/zero like this: > >> > >> echo "0 21474836480 zero" | dmsetup create zero > >> > >> My libvirt config for the disk is: > >> > >> <disk type='block' device='disk'> > >> <driver cache='none'/> > >> <source dev='/dev/mapper/zero'/> > >> <target dev='vdb' bus='virtio'/> > >> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' > >> function='0x0'/> > >> </disk> > >> > >> which translates to the kvm arg: > >> > >> -device > >> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0 > >> -drive file=/dev/mapper/zero,if=none,id=drive-virtio-disk1,cache=none > >> > >> I'm testing with dd: > >> > >> dd if=/dev/vdb of=/dev/null bs=8k iflag=direct > >> > >> As a side note, as you increase the block size read performance in > >> O_DIRECT mode starts to overtake non O_DIRECT mode reads (from about > >> 150k block size). By 550k block size I'm seeing 1 GB/s reads with > >> O_DIRECT and 770 MB/s without. > > Can you take QEMU out of the picture and run the same test on the host: > > dd if=/dev/vdb of=/dev/null bs=8k iflag=direct > vs > dd if=/dev/vdb of=/dev/null bs=8k > > This isn't quite the same because QEMU will use a helper thread doing > preadv. I'm not sure what syscall dd will use. > > It should be close enough to determine whether QEMU and device > emulation are involved at all though, or whether these differences are > due to the host kernel code path down to the device mapper zero device > being different for normal vs O_DIRECT. dd if=/dev/mapper/zero of=/dev/null bs=8k count=1000000 iflag=direct 8192000000 bytes (8.2 GB) copied, 3.46529 s, 2.4 GB/s dd if=/dev/mapper/zero of=/dev/null bs=8k count=1000000 8192000000 bytes (8.2 GB) copied, 5.5741 s, 1.5 GB/s dd is just using read. Thanks, John. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html