On Wed, Mar 14, 2012 at 10:46 AM, Avi Kivity <avi@xxxxxxxxxx> wrote: > On 03/14/2012 12:39 PM, Stefan Hajnoczi wrote: >> On Wed, Mar 14, 2012 at 10:05 AM, Avi Kivity <avi@xxxxxxxxxx> wrote: >> > On 03/14/2012 11:59 AM, Stefan Hajnoczi wrote: >> >> On Wed, Mar 14, 2012 at 9:22 AM, Avi Kivity <avi@xxxxxxxxxx> wrote: >> >> > On 03/13/2012 12:42 PM, Amos Kong wrote: >> >> >> Boot up guest with 232 virtio-blk disk, qemu will abort for fail to >> >> >> allocate ioeventfd. This patchset changes kvm_has_many_ioeventfds(), >> >> >> and check if available ioeventfd exists. If not, virtio-pci will >> >> >> fallback to userspace, and don't use ioeventfd for io notification. >> >> > >> >> > How about an alternative way of solving this, within the memory core: >> >> > trap those writes in qemu and write to the ioeventfd yourself. This way >> >> > ioeventfds work even without kvm: >> >> > >> >> > >> >> > core: create eventfd >> >> > core: install handler for memory address that writes to ioeventfd >> >> > kvm (optional): install kernel handler for ioeventfd >> >> > >> >> > even if the third step fails, the ioeventfd still works, it's just slower. >> >> >> >> That approach will penalize guests with large numbers of disks - they >> >> see an extra switch to vcpu thread instead of kvm.ko -> iothread. >> > >> > It's only a failure path. The normal path is expected to have a kvm >> > ioeventfd installed. >> >> It's the normal path when you attach >232 virtio-blk devices to a >> guest (or 300 in the future). > > Well, there's nothing we can do about it. > > We'll increase the limit of course, but old kernels will remain out > there. The right fix is virtio-scsi anyway. > >> >> It >> >> seems okay provided we can solve the limit in the kernel once and for >> >> all by introducing a more dynamic data structure for in-kernel >> >> devices. That way future kernels will never hit an arbitrary limit >> >> below their file descriptor rlimit. >> >> >> >> Is there some reason why kvm.ko must use a fixed size array? Would it >> >> be possible to use a tree (maybe with a cache for recent lookups)? >> > >> > It does use bsearch today IIRC. We'll expand the limit, but there must >> > be a limit, and qemu must be prepared to deal with it. >> >> Shouldn't the limit be the file descriptor rlimit? If userspace >> cannot create more eventfds then it cannot set up more ioeventfds. > > You can use the same eventfd for multiple ioeventfds. If you mean to > slave kvm's ioeventfd limit to the number of files the process can have, > that's a good idea. Surely an ioeventfd occupies less resources than an > open file. Yes. Ultimately I guess you're right in that we still need to have an error path and virtio-scsi will reduce the pressure on I/O eventfds for storage. Stefan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html