Re: [RFC PATCH] VFIO: Add a parameter to force nonthread IRQ

Yunhong Jiang <yunhong.jiang@xxxxxxxxxxxxxxx> · Wed, 28 Oct 2015 10:50:13 -0700

On Wed, Oct 28, 2015 at 01:44:55AM +0100, Paolo Bonzini wrote:
> 
> 
> On 27/10/2015 22:26, Yunhong Jiang wrote:
> >> > On RT kernels however can you call eventfd_signal from interrupt
> >> > context?  You cannot call spin_lock_irqsave (which can sleep) from a
> >> > non-threaded interrupt handler, can you?  You would need a raw spin lock.
> > Thanks for pointing this out. Yes, we can't call spin_lock_irqsave on RT 
> > kernel. Will do this way on next patch. But not sure if it's overkill to use 
> > raw_spinlock there since the eventfd_signal is used by other caller also.
> 
> No, I don't think you can use raw_spinlock there.  The problem is not
> just eventfd_signal, it is especially wake_up_locked_poll.  You cannot
> convert the whole workqueue infrastructure to use raw_spinlock.

You mean the waitqueue, instead of workqueue, right? One choice is to change 
the eventfd to use simple wait queue, which is raw_spinlock. But use simple 
waitqueue on eventfd may in fact impact real time latency if not in this 
scenario.

> 
> Alex, would it make sense to use the IRQ bypass infrastructure always,
> not just for VT-d, to do the MSI injection directly from the VFIO
> interrupt handler and bypass the eventfd?  Basically this would add an
> RCU-protected list of consumers matching the token to struct
> irq_bypass_producer, and a
> 
> 	int (*inject)(struct irq_bypass_consumer *);
> 
> callback to struct irq_bypass_consumer.  If any callback returns true,
> the eventfd is not signaled.  The KVM implementation would be like this
> (compare with virt/kvm/eventfd.c):
> 
> 	/* Extracted out of irqfd_wakeup */
> 	static int
> 	irqfd_wakeup_pollin(struct kvm_kernel_irqfd *irqfd)
> 	{
> 		...
> 	}
> 
> 	/* Extracted out of irqfd_wakeup */
> 	static int
> 	irqfd_wakeup_pollhup(struct kvm_kernel_irqfd *irqfd)
> 	{
> 		...
> 	}
> 
> 	static int
> 	irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync,
> 		     void *key)
> 	{
> 	        struct _irqfd *irqfd = container_of(wait,
> 			struct _irqfd, wait);
> 	        unsigned long flags = (unsigned long)key;
> 
> 		if (flags & POLLIN)
> 			irqfd_wakeup_pollin(irqfd);
> 		if (flags & POLLHUP)
> 			irqfd_wakeup_pollhup(irqfd);
> 
> 		return 0;
> 	}
> 
> 	static int kvm_arch_irq_bypass_inject(
> 		struct irq_bypass_consumer *cons)
> 	{
> 		struct kvm_kernel_irqfd *irqfd =
> 			container_of(cons, struct kvm_kernel_irqfd,
> 				     consumer);	
> 
> 		irqfd_wakeup_pollin(irqfd);
> 	}
> 
This is a good idea IMHO. So for MSI interrupt, the 
kvm_arch_irq_bypass_inject will be used, and the irqfd_wakeup will not be 
invoked anymore, am I right?

I noticed the irq bypass manager is not merged yet, are there any git branch 
for it?

> Or do you think it would be a hack?  The latency improvement might
> actually be even better than what Yunhong is already reporting.

I will be glad to try it.

Thanks
--jyh

> 
> Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html