Michael S. Tsirkin wrote: > On Sun, May 31, 2009 at 11:30:48PM +0300, Avi Kivity wrote: > >> Michael S. Tsirkin wrote: >> >>>> Version N of irqfd actually had the kernel create the fd, due to >>>> concerns about eventfd's flexibility (thread wakeup vs function >>>> call). As it turned out these concerns were misplaced (well, we >>>> still want the call to happen in process context when available). >>>> >>>> >>> I'm afraid there are deep lifetime issues there, and the recent patch >>> calling eventfd_fget seems to be just papering over the worst of them. >>> >>> >> You'll have to be more specific. >> > > My concern is that we do fget on eventfd and keep this reference until > fput is done on vm fd. Hi Michael, This is not really the full picture, and I think it might be where all the confusion starts. You are only covering the case where kvm is the first to close (and if you think about it, you need to handle that case as well just like me or the tables are turned). We both agree that a irqfd or irqfd-like concept and kvm have a relationship with one another, and that we have to manage that relationship, right? The relationship starts with an IRQFD_ASSIGN, and it stops when either the irqfd is closed, or if the kvm is closed (whichever comes first). The lifetimes are actually identical with your proposal if you think about it. Only the mechanics of how to get there are (slightly) different. i.e. If the IRQFD wants to close first, you do an ioctl(kvmfd, IRQFD_DEASSIGN)+close(irqfd). If kvm wants to close first, you do a close(kvmfd). I do not think there is really any issue with lifetimes there. I suppose you could argue: "well what if they do the close(irqfd) but not the ioctl() (or vice versa)?", and to that I would say that its no different than if userspace forgot to do "X" in any other resource. The fact is that userspace holds a number of kernel resources, and they can either be explicitly freed (such as with a close()), or they will be implicitly freed when the task exits. I think all of these requirements are met here, so I do not see a problem. Yes, I agree that having to do two system calls to completely close it are not as attractive as one, but the tradeoff is to potentially not use eventfd as the underlying basis for the construct. There are distinct advantages to using eventfd here, so we would like to continue to do so unless someone can display a compelling reason not to. So far I am not seeing such a reason. A potential compromise is to investigate the POLLHUP technique that Davide mentioned so that kvmfd can get notified of the closure without needing an additional explicit ioctl to do it. Note that we already have irqfd in the tree so I assume we would need to do this in a ABI friendly way, but its possible. > This works as long as no one else does > similar tricks. Imagine for example eventfd or another fs/ change that makes > eventfd do fget on descriptor X and keep it until fput is done on eventfd. > We'll get resource leak if kvm fd is substituted for X. > I don't think thats a realistic concern to assume eventfd would ever be grabbing other fd's, but I think Avi answered this succinctly in his reply to this mail so I won't rehash it. > What do you think? > > >>> >>> >>>> I'd really like to stick with eventfd if we can solve all the >>>> problems there, rather than creating yet another interface. >>>> Especially if we want uio to communicate directly with kvm. >>>> >>>> >>> Actually, current irqfd might not be able to handle assigned pci devices >>> because of the trick it does with set_irq(1)/set_irq(0) trick. >>> Guest drivers for pci devices likely assume the interrupt >>> is level. >>> >>> >> Right. I'm willing to have some userspace mediation for level-triggered >> interrupts. >> > > In other words, you want to keep using KVM_IRQ_LINE for this, as well? > Or more specifically, if you need something more than a basic edge interrupt, you should use the existing interfaces. We set the stake in the ground during review that irqfd would only support interfaces that can do MSI/edge like injections. > > >> It's a corner case anyway as we don't support shared >> interrupts on the host, and PCI level-triggered interrupts are very >> likely to be shared. >> > > If you think about virtio-net-host, there's no host interrupt there. > > >>> With virt devices, what we'd do is create a virt device that attaches to >>> uio driver. This would handle interrupts and everything else that needs >>> to live in kernel >>> >> With irqfd, what we do is attach an eventfd to the MSI we're interested >> in. Given that eventfds are usable from userspace, we're adding a >> non-virt-specific interface to uio that serves kvm well. Both uio and >> kvm win. >>
Attachment:
signature.asc
Description: OpenPGP digital signature