On Tue, Mar 13, 2012 at 2:47 PM, Amos Kong <akong@xxxxxxxxxx> wrote: > ----- Original Message ----- >> On Tue, Mar 13, 2012 at 11:51 AM, Amos Kong <akong@xxxxxxxxxx> wrote: >> > On 13/03/12 19:23, Stefan Hajnoczi wrote: >> >> >> >> On Tue, Mar 13, 2012 at 10:42 AM, Amos Kong<akong@xxxxxxxxxx> >> >> wrote: >> >>> >> >>> Boot up guest with 232 virtio-blk disk, qemu will abort for fail >> >>> to >> >>> allocate ioeventfd. This patchset changes >> >>> kvm_has_many_ioeventfds(), >> >>> and check if available ioeventfd exists. If not, virtio-pci will >> >>> fallback to userspace, and don't use ioeventfd for io >> >>> notification. >> > >> > >> > Hi Stefan, >> > >> > >> >> Please explain how it fails with 232 devices. Where does it abort >> >> and >> >> why? >> > >> > >> > (gdb) bt >> > #0 0x00007ffff48c8885 in raise () from /lib64/libc.so.6 >> > #1 0x00007ffff48ca065 in abort () from /lib64/libc.so.6 >> > #2 0x00007ffff7e89a3d in kvm_io_ioeventfd_add >> > (section=0x7fffbfbf5610, >> > match_data=true, data=0, fd=461) at /home/devel/qemu/kvm-all.c:778 >> > #3 0x00007ffff7e89b3f in kvm_eventfd_add (listener=0x7ffff82ebe80, >> > section=0x7fffbfbf5610, match_data=true, data=0, fd=461) at >> > /home/devel/qemu/kvm-all.c:802 >> > #4 0x00007ffff7e9bcf7 in address_space_add_del_ioeventfds >> > (as=0x7ffff8b278a0, fds_new=0x7fffb80106f0, fds_new_nb=201, >> > fds_old=0x7fffb800db20, fds_old_nb=200) >> > at /home/devel/qemu/memory.c:612 >> > #5 0x00007ffff7e9c04f in address_space_update_ioeventfds >> > (as=0x7ffff8b278a0) at /home/devel/qemu/memory.c:645 >> > #6 0x00007ffff7e9caa0 in address_space_update_topology >> > (as=0x7ffff8b278a0) >> > at /home/devel/qemu/memory.c:726 >> > #7 0x00007ffff7e9cb95 in memory_region_update_topology >> > (mr=0x7fffdeb179b0) >> > at /home/devel/qemu/memory.c:746 >> > #8 0x00007ffff7e9e802 in memory_region_add_eventfd >> > (mr=0x7fffdeb179b0, >> > addr=16, size=2, match_data=true, data=0, fd=461) at >> > /home/devel/qemu/memory.c:1220 >> > #9 0x00007ffff7d9e832 in virtio_pci_set_host_notifier_internal >> > (proxy=0x7fffdeb175a0, n=0, assign=true) at >> > /home/devel/qemu/hw/virtio-pci.c:175 >> > #10 0x00007ffff7d9ea5f in virtio_pci_start_ioeventfd >> > (proxy=0x7fffdeb175a0) >> > at /home/devel/qemu/hw/virtio-pci.c:230 >> > #11 0x00007ffff7d9ee51 in virtio_ioport_write >> > (opaque=0x7fffdeb175a0, >> > addr=18, val=7) at /home/devel/qemu/hw/virtio-pci.c:325 >> > #12 0x00007ffff7d9f37b in virtio_pci_config_writeb >> > (opaque=0x7fffdeb175a0, >> > addr=18, val=7) at /home/devel/qemu/hw/virtio-pci.c:457 >> > #13 0x00007ffff7e9ac23 in memory_region_iorange_write >> > (iorange=0x7fffb8005cc0, offset=18, width=1, data=7) at >> > /home/devel/qemu/memory.c:427 >> > #14 0x00007ffff7e857e2 in ioport_writeb_thunk >> > (opaque=0x7fffb8005cc0, >> > addr=61970, data=7) at /home/devel/qemu/ioport.c:212 >> > #15 0x00007ffff7e85197 in ioport_write (index=0, address=61970, >> > data=7) at >> > /home/devel/qemu/ioport.c:83 >> > #16 0x00007ffff7e85d9a in cpu_outb (addr=61970, val=7 '\a') at >> > /home/devel/qemu/ioport.c:289 >> > #17 0x00007ffff7e8a70a in kvm_handle_io (port=61970, >> > data=0x7ffff7c11000, >> > direction=1, size=1, count=1) at /home/devel/qemu/kvm-all.c:1123 >> > #18 0x00007ffff7e8ad0a in kvm_cpu_exec (env=0x7fffc1688010) at >> > /home/devel/qemu/kvm-all.c:1271 >> > #19 0x00007ffff7e595fc in qemu_kvm_cpu_thread_fn >> > (arg=0x7fffc1688010) at >> > /home/devel/qemu/cpus.c:733 >> > #20 0x00007ffff63687f1 in start_thread () from >> > /lib64/libpthread.so.0 >> > #21 0x00007ffff497b92d in clone () from /lib64/libc.so.6 >> > (gdb) frame 2 >> > #2 0x00007ffff7e89a3d in kvm_io_ioeventfd_add >> > (section=0x7fffbfbf5610, >> > match_data=true, data=0, fd=461) at /home/devel/qemu/kvm-all.c:778 >> > 778 abort(); >> > (gdb) l >> > 773 assert(match_data && section->size == 2); >> > 774 >> > 775 r = kvm_set_ioeventfd_pio_word(fd, >> > section->offset_within_address_space, >> > 776 data, true); >> > 777 if (r < 0) { >> > 778 abort(); >> > 779 } >> >> This is where graceful fallback code needs to be added. I have added >> Avi because I'm not very familiar with the new memory API and how it >> does ioeventfd. > > Hi, Stefan > > I thought a fix here, but virtio_pci_start_ioeventfd() could not know allocation failed. > > diff --git a/kvm-all.c b/kvm-all.c > index 77eadf6..7157e78 100644 > --- a/kvm-all.c > +++ b/kvm-all.c > @@ -771,6 +771,8 @@ static void kvm_io_ioeventfd_add(MemoryRegionSection *sectio > > r = kvm_set_ioeventfd_pio_word(fd, section->offset_within_address_space, > data, true); > + if (r == -ENOSPC) > + return; The caller needs a way to detect the failure. > if (r < 0) { > abort(); > } > > > >> Basically we need a way for ioeventfd to fail if we hit rlimit or the >> in-kernel io bus device limit. Suggestions? > > > >> (The reason I don't like using has_many_ioeventfds() is because it's >> ugly and inefficient to open and then close file descriptors - >> especially in a multithreaded program where file descriptor >> availability can change while we're executing.) > > I identified it's too ugly ;) > but I want to reserve it for older kernel (iobus limit is 6) > > Can we remove the check for old kernel (iobus limit is 6)? > user can disable ioeventfd through qemu cmdline > virtio-net-pci.ioeventfd=on/off > virtio-blk-pci.ioeventfd=on/off Why do you want to remove the iobus limit 6 check? The point of that check is to allow vhost-net to always work since it requires an ioeventfd. Userspace virtio devices, on the other hand, can gracefully fall back to non-ioeventfd. Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html