Re: [UNTESTED] KVM: do not call kvm_set_irq from irq disabled section

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Apr 21, 2010 at 09:38:39PM +0300, Gleb Natapov wrote:
> On Wed, Apr 21, 2010 at 03:29:11PM -0300, Marcelo Tosatti wrote:
> > On Wed, Apr 21, 2010 at 08:58:48PM +0300, Gleb Natapov wrote:
> > > On Wed, Apr 21, 2010 at 02:37:34PM -0300, Marcelo Tosatti wrote:
> > > > On Wed, Apr 21, 2010 at 08:12:27PM +0300, Gleb Natapov wrote:
> > > > > On Wed, Apr 21, 2010 at 12:58:41PM -0300, Marcelo Tosatti wrote:
> > > > > > > Or could we make kvm_set_irq() atomic? Though the code path is a little long 
> > > > > > > for spinlock.
> > > > > > 
> > > > > > Yes, given the sleep-inside-RCU-protected section bug from
> > > > > > kvm_notify_acked_irq, either that or convert IRQ locking to SRCU.
> > > > > > 
> > > > > > But as you said, the code paths are long and potentially slow, so
> > > > > > probably SRCU is a better alternative.
> > > > > > 
> > > > > > Gleb?
> > > > > kvm_set_irq() was converted to rcu from mutex to make msix interrupt
> > > > > injection scalable.
> > > > 
> > > > We meant ioapic lock. See the last report from Ralf on this thread. 
> > > Can we solve the problem by calling ack notifier outside rcu read
> > > section in kvm_notify_acked_irq()?
> > 
> > The unregister path does
> > 
> > - remove_from_list(entry)
> > - synchronize_rcu
> > - kfree(entry)
> > 
> > So if kvm_notify_acked_irq sleeps, synchronize_rcu can succeed, and the
> > notifier entry can be freed.
> What I mean is kvm_notify_acked_irq() will iterate over all ack entries
> in rcu read protected section, but instead of calling callback it will
> collect them into the array and call them outside rcu read section. At
> this point it doesn't matter if entry is unregistered since the copy is
> used to actually call the notifier.

Here it is, but no, this trick can't be done safely for ack notifiers
because they are unregistered at runtime by device assignment.

How does the freeing path knows it can proceed to free its entry if
there is no reliable way to know if there is a reference to itself?
(think "priv" gets freed between rcu_read_unlock and ->irq_acked with
the patch below).

I can convert to SRCU if you have no objections.

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 0150aff..900ac05 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -241,8 +241,8 @@ int pit_has_pending_timer(struct kvm_vcpu *vcpu)
 
 static void kvm_pit_ack_irq(struct kvm_irq_ack_notifier *kian)
 {
-	struct kvm_kpit_state *ps = container_of(kian, struct kvm_kpit_state,
-						 irq_ack_notifier);
+	struct kvm_kpit_state *ps = kian->priv;
+
 	raw_spin_lock(&ps->inject_lock);
 	if (atomic_dec_return(&ps->pit_timer.pending) < 0)
 		atomic_inc(&ps->pit_timer.pending);
@@ -636,6 +636,7 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
 		     CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
 	pit_state->irq_ack_notifier.gsi = 0;
 	pit_state->irq_ack_notifier.irq_acked = kvm_pit_ack_irq;
+	pit_state->irq_ack_notifier.priv = &pit->pit_state;
 	kvm_register_irq_ack_notifier(kvm, &pit_state->irq_ack_notifier);
 	pit_state->pit_timer.reinject = true;
 	mutex_unlock(&pit->pit_state.lock);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ce027d5..6c3fb06 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -404,6 +404,7 @@ struct kvm_irq_ack_notifier {
 	struct hlist_node link;
 	unsigned gsi;
 	void (*irq_acked)(struct kvm_irq_ack_notifier *kian);
+	void *priv;
 };
 
 #define KVM_ASSIGNED_MSIX_PENDING		0x1
diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
index 4d10b1e..779b749 100644
--- a/virt/kvm/assigned-dev.c
+++ b/virt/kvm/assigned-dev.c
@@ -121,8 +121,7 @@ static void kvm_assigned_dev_ack_irq(struct kvm_irq_ack_notifier *kian)
 	if (kian->gsi == -1)
 		return;
 
-	dev = container_of(kian, struct kvm_assigned_dev_kernel,
-			   ack_notifier);
+	dev = kian->priv;
 
 	kvm_set_irq(dev->kvm, dev->irq_source_id, dev->guest_irq, 0);
 
@@ -563,6 +562,7 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
 	match->irq_source_id = -1;
 	match->kvm = kvm;
 	match->ack_notifier.irq_acked = kvm_assigned_dev_ack_irq;
+	match->ack_notifier.priv = match;
 	INIT_WORK(&match->interrupt_work,
 		  kvm_assigned_dev_interrupt_work_handler);
 
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index a0e8880..ebccea8 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -175,11 +175,14 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level)
 	return ret;
 }
 
+#define MAX_ACK_NOTIFIER_PER_GSI 4
+
 void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
 {
 	struct kvm_irq_ack_notifier *kian;
+	struct kvm_irq_ack_notifier acks[MAX_ACK_NOTIFIER_PER_GSI];
 	struct hlist_node *n;
-	int gsi;
+	int gsi, i, acks_nr = 0;
 
 	trace_kvm_ack_irq(irqchip, pin);
 
@@ -188,9 +191,15 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
 	if (gsi != -1)
 		hlist_for_each_entry_rcu(kian, n, &kvm->irq_ack_notifier_list,
 					 link)
-			if (kian->gsi == gsi)
-				kian->irq_acked(kian);
+			if (kian->gsi == gsi) {
+				if (acks_nr == MAX_ACK_NOTIFIER_PER_GSI)
+					break;
+				acks[acks_nr++] = *kian;
+			}
 	rcu_read_unlock();
+
+	for (i = 0; i < acks_nr; i++)
+		acks[i].irq_acked(&acks[i]);
 }
 
 void kvm_register_irq_ack_notifier(struct kvm *kvm,
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux