On 24/11/2015 07:29, Ming Lin wrote: >> Here is new performance number: >> >> qemu-nvme + google-ext + eventfd: 294MB/s >> virtio-blk: 344MB/s >> virtio-scsi: 296MB/s >> >> It's almost same as virtio-scsi. Nice. Pretty good indeed. > Looks like "regular MMIO" runs in vcpu thread, while "eventfd MMIO" runs > in the main loop thread. > > Could you help to explain why eventfd MMIO gets better performance? Because VCPU latency is really everything if the I/O is very fast _or_ if the queue depth is high; signaling an eventfd is cheap enough to give a noticeable boost in VCPU latency. Waking up a sleeping process is a bit expensive, but if you manage to keep the iothread close to 100% CPU, the main loop thread's poll() is usually quite cheap too. > call stack: regular MMIO > ======================== > nvme_mmio_write (qemu/hw/block/nvme.c:921) > memory_region_write_accessor (qemu/memory.c:451) > access_with_adjusted_size (qemu/memory.c:506) > memory_region_dispatch_write (qemu/memory.c:1158) > address_space_rw (qemu/exec.c:2547) > kvm_cpu_exec (qemu/kvm-all.c:1849) > qemu_kvm_cpu_thread_fn (qemu/cpus.c:1050) > start_thread (pthread_create.c:312) > clone > > call stack: eventfd MMIO > ========================= > nvme_sq_notifier (qemu/hw/block/nvme.c:598) > aio_dispatch (qemu/aio-posix.c:329) > aio_ctx_dispatch (qemu/async.c:232) > g_main_context_dispatch > glib_pollfds_poll (qemu/main-loop.c:213) > os_host_main_loop_wait (qemu/main-loop.c:257) > main_loop_wait (qemu/main-loop.c:504) > main_loop (qemu/vl.c:1920) > main (qemu/vl.c:4682) > __libc_start_main For comparison, here is the "iothread+eventfd MMIO" stack nvme_sq_notifier (qemu/hw/block/nvme.c:598) aio_dispatch (qemu/aio-posix.c:329) aio_poll (qemu/aio-posix.c:474) iothread_run (qemu/iothread.c:170) __libc_start_main aio_poll is much more specialized than the main thread (which uses glib and thus wraps aio_poll with a GSource adapter), and can be faster too. (That said, things are still a bit in flux here. 2.6 will have pretty heavy changes in this area, but the API will be the same). Even more performance can be squeezed by adding a little bit of busy waiting to aio_poll() before going to the blocking poll(). This avoids very short idling and can improve things even more. BTW, you may want to Cc qemu-block@xxxxxxxxxx in addition to qemu-devel@xxxxxxxxxx. Most people are on both lists, but some notice things faster if you write to the lower-traffic qemu-block mailing list. Paolo _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization