On Wed, Feb 08, 2017 at 09:26:10AM +0100, Paolo Bonzini wrote: > > > On 08/02/2017 08:58, Peter Xu wrote: > > This idea was invoked when I was trying to solve an emulated VT-d issue > > when guest kernel setup incorrect IRTE. When that happens, instead of > > raising error immediately, what we should do is to keep the error, and > > inject this error to vIOMMU when the specific interrupt is triggered. > > > > However this is very hard to be achieved since for now vIOMMU is working > > in userspace, while currently there is no simple way that kernel irq can > > talk to a userspace program. > > > > With this patch, we can easily provide such a way that when guest fault > > irq is triggered, kernel can notify user program by signaling the > > corresponding eventfd handle > > I think I understand the scenario, but I don't understand why it needs > kernel intervention. Why couldn't this be handled entirely in > userspace, without ever setting up a GSI route or irqfd in KVM? In > other words, you're doing > > write(irqfd) read(irq fault eventfd) > | ^ > v | > KVM -------> KVM_GSI_ROUTING_EVENTFD > > but why is this needed as opposed to just > > write(irqfd) ------> read(irq fault eventfd) > > ? Paolo, Thanks for pointing out this issue. This is one of my concern as well, and Jan has had the same comment before (sorry I forgot to cc Jan, doing it now). I was trying to identify the risk for both solutions. I posted this just trying to choose one way out of the two, and currently this series is my clumsy choice. I agree that logicall this can be done all in userspace. Now the problem is that, I am afraid we can't do it easily, and lots of codes may need to be touched for QEMU to achieve a whole userspace solution - now we not only need irqfd and virq to setup a route, we need to prepare a fault sink for each of the irqfd. Not to say that we have lots of assumption in QEMU that virq and irqfd are treated seperately. That'll of course expand the test scope as well, covering all the devices using irqfd. Of course we can try to abstract that out into something common, I am just afraid that'll be still a relatively big change, and I am still uncertain about the "size" of it. With this feature, everything will be in control totally in vIOMMU side in QEMU. It'll be fairly straightforward and clean. But of course I understand your concern since we should keep KVM as simple as possible, even this is only tens of lines of changes (I wouldn't dare to post a series for this for more than 100 LOCs :-). And, even if we don't like this feature bit, I am still not sure whether I should move on on QEMU side to do a possible big change only to benefit error handling in VT-d vIOMMU. Any of your further comment would be welcomed to help me settle this down. (Or, maybe I should just put this problem aside for now. Comparing to solve this, I think fixing the IRTE bug in kernel might be easier :-) Thanks, -- peterx