Re: [PATCH 0/3] RFC: virtual device as irq injection interface

Gregory Haskins <ghaskins@xxxxxxxxxx> · Mon, 01 Jun 2009 08:00:17 -0400

Michael S. Tsirkin wrote:
> On Sun, May 31, 2009 at 11:30:48PM +0300, Avi Kivity wrote:
>   
>> Michael S. Tsirkin wrote:
>>     
>>>> Version N of irqfd actually had the kernel create the fd, due to   
>>>> concerns about eventfd's flexibility (thread wakeup vs function 
>>>> call).   As it turned out these concerns were misplaced (well, we 
>>>> still want the  call to happen in process context when available).
>>>>     
>>>>         
>>> I'm afraid there are deep lifetime issues there, and the recent patch
>>> calling eventfd_fget seems to be just papering over the worst of them.
>>>   
>>>       
>> You'll have to be more specific.
>>     
>
> My concern is that we do fget on eventfd and keep this reference until
> fput is done on vm fd.

Hi Michael,
  This is not really the full picture, and I think it might be where all
the confusion starts.  You are only covering the case where kvm is the
first to close (and if you think about it, you need to handle that case
as well just like me or the tables are turned).

We both agree that a irqfd or irqfd-like concept and kvm have a
relationship with one another, and that we have to manage that
relationship, right?  The relationship starts with an IRQFD_ASSIGN, and
it stops when either the irqfd is closed, or if the kvm is closed
(whichever comes first).  The lifetimes are actually identical with your
proposal if you think about it.  Only the mechanics of how to get there
are (slightly) different.

i.e. If the IRQFD wants to close first, you do an ioctl(kvmfd,
IRQFD_DEASSIGN)+close(irqfd).   If kvm wants to close first, you do a
close(kvmfd).  I do not think there is really any issue with lifetimes
there.

I suppose you could argue: "well what if they do the close(irqfd) but
not the ioctl() (or vice versa)?", and to that I would say that its no
different than if userspace forgot to do "X" in any other resource.  The
fact is that userspace holds a number of kernel resources, and they can
either be explicitly freed (such as with a close()), or they will be
implicitly freed when the task exits.  I think all of these requirements
are met here, so I do not see a problem.

Yes, I agree that having to do two system calls to completely close it
are not as attractive as one, but the tradeoff is to potentially not use
eventfd as the underlying basis for the construct.  There are distinct
advantages to using eventfd here, so we would like to continue to do so
unless someone can display a compelling reason not to.  So far I am not
seeing such a reason.

A potential compromise is to investigate the POLLHUP technique that
Davide mentioned so that kvmfd can get notified of the closure without
needing an additional explicit ioctl to do it.  Note that we already
have irqfd in the tree so I assume we would need to do this in a ABI
friendly way, but its possible.

>  This works as long as no one else does
> similar tricks. Imagine for example eventfd or another fs/ change that makes
> eventfd do fget on descriptor X and keep it until fput is done on eventfd.
> We'll get resource leak if kvm fd is substituted for X.
>   

I don't think thats a realistic concern to assume eventfd would ever be
grabbing other fd's, but I think Avi answered this succinctly in his
reply to this mail so I won't rehash it.

> What do you think?
>
>   
>>>   
>>>       
>>>> I'd really like to stick with eventfd if we can solve all the 
>>>> problems  there, rather than creating yet another interface.
>>>> Especially if we want uio to communicate directly with kvm.
>>>>     
>>>>         
>>> Actually, current irqfd might not be able to handle assigned pci devices
>>> because of the trick it does with set_irq(1)/set_irq(0) trick.
>>> Guest drivers for pci devices likely assume the interrupt
>>> is level.
>>>   
>>>       
>> Right.  I'm willing to have some userspace mediation for level-triggered  
>> interrupts.
>>     
>
> In other words, you want to keep using KVM_IRQ_LINE for this, as well?
>   

Or more specifically, if you need something more than a basic edge
interrupt, you should use the existing interfaces.  We set the stake in
the ground during review that irqfd would only support interfaces that
can do MSI/edge like injections.
>
>   
>> It's a corner case anyway as we don't support shared  
>> interrupts on the host, and PCI level-triggered interrupts are very  
>> likely to be shared.
>>     
>
> If you think about virtio-net-host, there's no host interrupt there.
>
>   
>>> With virt devices, what we'd do is create a virt device that attaches to
>>> uio driver.  This would handle interrupts and everything else that needs
>>> to live in kernel
>>>       
>> With irqfd, what we do is attach an eventfd to the MSI we're interested  
>> in.  Given that eventfds are usable from userspace, we're adding a  
>> non-virt-specific interface to uio that serves kvm well.  Both uio and  
>> kvm win.
>>     

Attachment:
signature.asc

Description: OpenPGP digital signature