Re: [KVM PATCH v2 2/2] kvm: use POLLHUP to close an irqfd instead of an explicit ioctl

"Michael S. Tsirkin" <mst@xxxxxxxxxx> · Sun, 14 Jun 2009 16:28:43 +0300

On Sun, Jun 14, 2009 at 08:53:11AM -0400, Gregory Haskins wrote:
> Michael S. Tsirkin wrote:
> > On Thu, Jun 04, 2009 at 08:48:12AM -0400, Gregory Haskins wrote:
> >   
> >> +static void
> >> +irqfd_disconnect(struct _irqfd *irqfd)
> >> +{
> >> +	struct kvm *kvm;
> >> +
> >> +	mutex_lock(&irqfd->lock);
> >> +
> >> +	kvm = rcu_dereference(irqfd->kvm);
> >> +	rcu_assign_pointer(irqfd->kvm, NULL);
> >> +
> >> +	mutex_unlock(&irqfd->lock);
> >> +
> >> +	if (!kvm)
> >> +		return;
> >>  
> >>  	mutex_lock(&kvm->lock);
> >> -	kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 1);
> >> -	kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd->gsi, 0);
> >> +	list_del(&irqfd->list);
> >>  	mutex_unlock(&kvm->lock);
> >> +
> >> +	/*
> >> +	 * It is important to not drop the kvm reference until the next grace
> >> +	 * period because there might be lockless references in flight up
> >> +	 * until then
> >> +	 */
> >> +	synchronize_srcu(&irqfd->srcu);
> >> +	kvm_put_kvm(kvm);
> >>  }
> >>     
> >
> > So irqfd object will persist after kvm goes away, until eventfd is closed?
> >   
> 
> Yep, by design.  It becomes part of the eventfd and is thus associated
> with its lifetime.  Consider it as if we made our own anon-fd
> implementation for irqfd and the lifetime looks similar.  The difference
> is that we are reusing eventfd and its interface semantics.
> >   
> >>  
> >>  static int
> >>  irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key)
> >>  {
> >>  	struct _irqfd *irqfd = container_of(wait, struct _irqfd, wait);
> >> +	unsigned long flags = (unsigned long)key;
> >>  
> >> -	/*
> >> -	 * The wake_up is called with interrupts disabled.  Therefore we need
> >> -	 * to defer the IRQ injection until later since we need to acquire the
> >> -	 * kvm->lock to do so.
> >> -	 */
> >> -	schedule_work(&irqfd->work);
> >> +	if (flags & POLLIN)
> >> +		/*
> >> +		 * The POLLIN wake_up is called with interrupts disabled.
> >> +		 * Therefore we need to defer the IRQ injection until later
> >> +		 * since we need to acquire the kvm->lock to do so.
> >> +		 */
> >> +		schedule_work(&irqfd->inject);
> >> +
> >> +	if (flags & POLLHUP) {
> >> +		/*
> >> +		 * The POLLHUP is called unlocked, so it theoretically should
> >> +		 * be safe to remove ourselves from the wqh using the locked
> >> +		 * variant of remove_wait_queue()
> >> +		 */
> >> +		remove_wait_queue(irqfd->wqh, &irqfd->wait);
> >> +		flush_work(&irqfd->inject);
> >> +		irqfd_disconnect(irqfd);
> >> +
> >> +		cleanup_srcu_struct(&irqfd->srcu);
> >> +		kfree(irqfd);
> >> +	}
> >>  
> >>  	return 0;
> >>  }
> >>     
> >
> > And it is removed by this function when eventfd is closed.
> > But what prevents the kvm module from going away, meanwhile?
> >   
> 
> Well, we hold a reference to struct kvm until we call
> irqfd_disconnect().  If kvm closes first, we disconnect and disassociate
> all references to kvm leaving irqfd->kvm = NULL.  Likewise, if irqfd
> closes first, we disassociate with kvm with the above quoted logic.  In
> either case, we are holding a kvm reference up until that "disconnect"
> point.  Therefore kvm should not be able to disappear before that
> disconnect, and after that point we do not care.

Yes, we do care.

Here's the scenario in more detail:

- kvm is closed
- irq disconnect is called
- kvm is put
- kvm module is removed: all irqs are disconnected
- eventfd closes and triggers callback into removed kvm module
- crash

> If that is not sufficient to prevent kvm.ko from going away in the
> middle, then IMO kvm_get_kvm() has a bug, not irqfd. ;) However, I
> believe everything is actually ok here.
> 
> -Greg
> 

BTW, why can't we remove irqfds in kvm_release?

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html