Re: bad O_DIRECT read and write performance with small block sizes with virtio

John Leach <john@xxxxxxxxxxxxxxx> · Tue, 03 Aug 2010 15:40:24 +0100

On Mon, 2010-08-02 at 21:50 +0100, Stefan Hajnoczi wrote:
> On Mon, Aug 2, 2010 at 6:46 PM, Anthony Liguori <anthony@xxxxxxxxxxxxx> wrote:
> > On 08/02/2010 12:15 PM, John Leach wrote:
> >>
> >> Hi,
> >>
> >> I've come across a problem with read and write disk IO performance when
> >> using O_DIRECT from within a kvm guest.  With O_DIRECT, reads and writes
> >> are much slower with smaller block sizes.  Depending on the block size
> >> used, I've seen 10 times slower.
> >>
> >> For example, with an 8k block size, reading directly from /dev/vdb
> >> without O_DIRECT I see 750 MB/s, but with O_DIRECT I see 79 MB/s.
> >>
> >> As a comparison, reading in O_DIRECT mode in 8k blocks directly from the
> >> backend device on the host gives 2.3 GB/s.  Reading in O_DIRECT mode
> >> from a xen guest on the same hardware manages 263 MB/s.
> >>
> >
> > Stefan has a few fixes for this behavior that help a lot.  One of them
> > (avoiding memset) is already upstream but not in 0.12.x.

Anthony, that patch is already applied in the RHEL6 package I'm been
testing with - I've just manually confirmed that.  Thanks though.

> >
> > The other two are not done yet but should be on the ML in the next couple
> > weeks.  They involve using ioeventfd for notification and unlocking the
> > block queue lock while doing a kick notification.
> 
> Thanks for mentioning those patches.  The ioeventfd patch will be sent
> this week, I'm checking that migration works correctly and then need
> to check that vhost-net still works.

I'll give them a test as soon as I can get hold of them, thanks Stefan!

> >> Writing is affected in the same way, and exhibits the same behaviour
> >> with O_SYNC too.
> >>
> >> Watching with vmstat on the host, I see the same number of blocks being
> >> read, but about 14 times the number of context switches in O_DIRECT mode
> >> (4500 cs vs. 63000 cs) and a little more cpu usage.
> >>
> >> The device I'm writing to is a device-mapper zero device that generates
> >> zeros on read and throws away writes, you can set it up
> >> at /dev/mapper/zero like this:
> >>
> >> echo "0 21474836480 zero" | dmsetup create zero
> >>
> >> My libvirt config for the disk is:
> >>
> >> <disk type='block' device='disk'>
> >>   <driver cache='none'/>
> >>   <source dev='/dev/mapper/zero'/>
> >>   <target dev='vdb' bus='virtio'/>
> >>   <address type='pci' domain='0x0000' bus='0x00' slot='0x06'
> >> function='0x0'/>
> >> </disk>
> >>
> >> which translates to the kvm arg:
> >>
> >> -device
> >> virtio-blk-pci,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virtio-disk0
> >> -drive file=/dev/mapper/zero,if=none,id=drive-virtio-disk1,cache=none
> >>
> >> I'm testing with dd:
> >>
> >> dd if=/dev/vdb of=/dev/null bs=8k iflag=direct
> >>
> >> As a side note, as you increase the block size read performance in
> >> O_DIRECT mode starts to overtake non O_DIRECT mode reads (from about
> >> 150k block size). By 550k block size I'm seeing 1 GB/s reads with
> >> O_DIRECT and 770 MB/s without.
> 
> Can you take QEMU out of the picture and run the same test on the host:
> 
> dd if=/dev/vdb of=/dev/null bs=8k iflag=direct
> vs
> dd if=/dev/vdb of=/dev/null bs=8k
> 
> This isn't quite the same because QEMU will use a helper thread doing
> preadv.  I'm not sure what syscall dd will use.
> 
> It should be close enough to determine whether QEMU and device
> emulation are involved at all though, or whether these differences are
> due to the host kernel code path down to the device mapper zero device
> being different for normal vs O_DIRECT.

dd if=/dev/mapper/zero of=/dev/null bs=8k count=1000000 iflag=direct
8192000000 bytes (8.2 GB) copied, 3.46529 s, 2.4 GB/s

dd if=/dev/mapper/zero of=/dev/null bs=8k count=1000000
8192000000 bytes (8.2 GB) copied, 5.5741 s, 1.5 GB/s

dd is just using read.

Thanks,

John.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html