RE: [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision

"Brown, Len" <len.brown@xxxxxxxxx> · Thu, 4 May 2006 12:04:19 -0400

>>>>> We should never have had a problem with un-connected 
>interrupt lines
>>>>> consuming vectors, as the vectors are handed out at run-time
>>>>> only when interrupts are requested by drivers.
>>>>
>>>>Incorrect.  By being requested by drivers I assume you mean by
>>>>request_irq.  assign_irq_vector is called long before request_irq
>>>>and in fact there is currently not a call from request_irq to
>>>>assign_irq_vector.  Look where assign_irq_vector is called in 
>>>io_apic.c
>>
>> Hmmm, on Natalie's ping, I looked at this again.
>>
>> setup_IO_APIC_irqs() actually consumes only 16 vectors.
>> This is because it only sets up the IRQs for pins
>> found with find_irq_entry(), which searches mp_irqs[]
>> which at this point contains only the legacy identify mappings.
>
>In the case of ACPI.  I think the mptable case has all of information
>in mp_irqs at that point.

Agreed, I just sent a note on this, but apparently it "crossed
in the mail" with yours.  The key point about MPS is that MPS
should not describe pins that can never be connected -- so that isn't
quite as bad as handing out a vector for every RTE, which is what the
code appears to do on first read...

>> The PNP code then takes a swing at things, and it registers
>> a handful of gsi's, but they are all duplicates of the legacy
>> mappings already set-up, so no additional vectors are consumed.
>>
>> So on a big system, the large quantity of vectors will not 
>be consumed
>> at
>> IOAPIC initialization time, but later at device probe time.
>> Not via request_irq(), but via pci_enable_device():
>>
>> pci_enable_device()
>>  pci_enable_device_bars()
>>   pcibios_enable_device()
>>    pcibios_enable_irq() -> acpi_pci_irq_enable()
>>     acpi_register_gsi()
>>      mp_register_gsi()
>>       io_apic_set_pci_routing()
>>        entry.vector = assign_irq_vector(irq)
>>
>> So except for the legacy IRQs, we are already allocating the vectors
>> on-demand, and that doesn't need to be fixed.
>
>I agree with the fact, we do allocate the vectors on-demand.
>Since the allocation is not allowed to fail, and because
>it seems to be an accident of the implementation rather
>than a deliberate implementation detail I still think it
>needs to be fixed so the code is less brittle.

Yeah, it isn't clear that this has any advantage over assigning
the vector at request_irq() time where one would expect to see it.
Though some might consider "currently working" an advantage:-)

>But if we are not afraid of breaking machines with more
>that 243 interrupt sources (which currently force ioapic/pin
>combinations to share irqs today) it does mean we can move
>the removal of the irq to gsi mapping up, in the patch series.  We
>first need to raise the limit on the number of IRQs on x86.

No, I don't think we have the license to intentionally break big
machines
that are currently working.  In the long run, these two big-machine
hacks should go away:

mp_register_gsi()
	/*
	 * For PCI devices, assign IRQs in order, avoiding gaps
	 * due to unused I/O APIC pins.
	 */
	...

io_apic_set_pci_routing() (x86_64 only upstream, i386 too on SuSE)
	irq = gsi_irq_sharing(irq)

I think what we can do in the short term is to make these workarounds
not have any effect on the systems which don't need them.  This means
searching like gsi_irq_sharing() does, instead of always compressing
like mp_register_gsi() does.  It also means not printing dmesg
about vector sharing when no sharing is actually happening.

>We can't raise the limit on the number of IRQs when msi is
>enabled but enabling CONFIG_PCI_MSI already has problem in this
>area and that is an independent fix.
>
>Since we unnecessarily break big machines I think this needs to sit
>in -mm until the window for 2.6.18 opens up.  By that time we should
>be able to get all of the pieces in to keep the largest of machines
>working well into the future.
>
>That includes fixing msi to allocate irqs, fixing the irq affinity
>calls to be race free, and making the vector allocation to be
>a per cpu allocation.
>
>I still think we need to do everything I outlined previously if
>for no other reason than to leave us with a maintainable code
>base that works for obvious reasons.  But since the dependencies
>are not as bad as they previously appeared, it does mean we can
>take the patches in a different order.

Based on past history of the un-intended impact of interrupt changes,
(eg. what started this thread)
I would suggest that only the simplest things go into 2.6.18
and that the larger changes stay in -mm for all of 2.6.18
and targtet 2.6.19.

-Len
-
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html