interrupt problem with multiple vnet devices

Chris Torek <chris.torek@xxxxxxxxxxxxx> · Wed, 29 Apr 2009 15:41:48 -0600

One of our guys is working on a problem with Linux as guest
domain client and multiple vnets bound to the same vswitch:

   ldm add-vnet vnet0 primary-vsw1 ldom1
   ldm add-vnet vnet1 primary-vsw1 ldom1

When configured this way, Linux does not boot properly, as
it winds up perpetually servicing vnet interrupts.

The fix he sent me is below, but I am very suspicious about
this as it essentially removes the linked-list of buckets
that the irq code runs.

I think the real problem is that, when multiple vnet devices are
on the same vswitch, the interrupts get re-enabled "too soon" so
that the sun4v_ivec.S code ends up making the linked list become
circular (by queueing work for a vnet cookie that is already queued).

It seems a little odd to me, though, that the sun4v_ivec.S code
builds up a backwards list (by pushing things onto the head of the
chain after setting the "next" based on the current head).  I guess
this is just because it is too hard to add to the end of the list.
(I would probably do this with the assembly code equivalent of:

    new->next = NULL;
    *tail = new;
    tail = &new->next;

but this requires adding a "tail" pointer to the per-cpu block,
and of course updating it in the irq_64.c code.  Note that we are
still working with older code that has a sparc64 directory.)

(This patch also includes code to handle irq redistribution
on ldom guests that have a small number of CPUs, with a kernel
built to run on many more CPUs.  It is still not ideal yet as
it does not do chip-then-strand distribution, I'm just including
it here out of laziness :-) / desire-not-to-break-diffs.)

Chris

diff --git a/arch/sparc64/kernel/sun4v_ivec.S b/arch/sparc64/kernel/sun4v_ivec.S
index e2f8e1b..edcd71e 100644
--- a/arch/sparc64/kernel/sun4v_ivec.S
+++ b/arch/sparc64/kernel/sun4v_ivec.S
@@ -108,8 +108,7 @@ sun4v_dev_mondo:
 	sllx	%g3, 4, %g3
 	add	%g4, %g3, %g4
 
-1:	ldx	[%g1], %g2
-	stxa	%g2, [%g4] ASI_PHYS_USE_EC
+1:	stxa	%g0, [%g4] ASI_PHYS_USE_EC
 	stx	%g4, [%g1]
 
 	/* Signal the interrupt by setting (1 << pil) in %softint.  */
diff --git a/arch/sparc64/kernel/irq.c b/arch/sparc64/kernel/irq.c
index 7872476..2584b9e 100644
--- a/arch/sparc64/kernel/irq.c
+++ b/arch/sparc64/kernel/irq.c
@@ -70,6 +70,7 @@ static unsigned long bucket_get_chain_pa(unsigned long bucket_pa)
 	return ret;
 }
 
+#if 0 /* This one is no longer needed. */
 static void bucket_clear_chain_pa(unsigned long bucket_pa)
 {
 	__asm__ __volatile__("stxa	%%g0, [%0] %1"
@@ -79,6 +80,7 @@ static void bucket_clear_chain_pa(unsigned long bucket_pa)
 					     __irq_chain_pa)),
 			       "i" (ASI_PHYS_USE_EC));
 }
+#endif
 
 static unsigned int bucket_get_virt_irq(unsigned long bucket_pa)
 {
@@ -251,7 +253,7 @@ static int irq_choose_cpu(unsigned int virt_irq)
 	cpumask_t mask = irq_desc[virt_irq].affinity;
 	int cpuid;
 
-	if (cpus_equal(mask, CPU_MASK_ALL)) {
+	if (cpus_subset(mask, CPU_MASK_ALL)) {
 		static int irq_rover;
 		static DEFINE_SPINLOCK(irq_rover_lock);
 		unsigned long flags;
@@ -735,10 +737,12 @@ void handler_irq(int irq, struct pt_regs *regs)
 
 		next_pa = bucket_get_chain_pa(bucket_pa);
 		virt_irq = bucket_get_virt_irq(bucket_pa);
-		bucket_clear_chain_pa(bucket_pa);
 
 		desc = irq_desc + virt_irq;
 
+		if (desc->chip->set_affinity)
+			desc->chip->set_affinity(virt_irq, cpu_online_map);
+
 		desc->handle_irq(virt_irq, desc);
 
 		bucket_pa = next_pa;
--
To unsubscribe from this list: send the line "unsubscribe sparclinux" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html