Re: [PATCH] sysfs: add per pci device msi[x] irq listing (v3)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 19, 2011 at 11:47:15AM -0400, Neil Horman wrote:
> So a while back, I wanted to provide a way for irqbalance (and other apps) to
> definitively map irqs to devices, which, for msi[x] irqs is currently not really
> possible in user space.  My first attempt wen't not so well:
> https://lkml.org/lkml/2011/4/21/308
> 
> It was plauged by the same issues that prior attempts were, namely that it
> violated the one-file-one-value sysfs rule.  I wandered off but have recently
> come back to this.  I've got a new implementation here that exports a new
> subdirectory for every pci device,  called msi_irqs.  This subdirectory contanis
> a variable number of numbered subdirectories, in which the number represents an
> msi irq.  Each numbered subdirectory contains attributes for that irq, which
> currently is only the mode it is operating in (msi vs. msix).  I think fits
> within the constraints sysfs requires, and will allow irqbalance to properly map
> msi irqs to devices without having to rely on rickety, best guess methods like
> interface name matching.

This approach feels like building bigger rockets instead of a space
elevator :-)

What we need is to allow device drivers to ask for per-CPU interrupts,
and implement them in terms of MSI-X.  I've made a couple of stabs at
implementing this, but haven't got anything working yet.  It would solve
a number of problems:

1. NUMA cacheline fetch.  At the moment, desc->istate gets modified by
handle_edge_irq.  handle_percpu_irq doesn't need to worry about any
of that stuff, so doesn't touch desc->istate.  I've heard this is a
significant problem for the high-speed networking people.

2. /proc/interrupts is unmanagable on large machines.  There are hundreds
of interrupts and dozens of CPUs.  This would go a long way to reducing
the number of rows in the table (doesn't do anything about the columns).

ie instead of this:

 79:          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1
 80:          0          0    9275611          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-0
 81:          0          0    9275611          0          0          0          0          0   PCI-MSI-edge      eth1-TxRx-1
 82:          0          0          0          0    9275611          0          0          0   PCI-MSI-edge      eth1-TxRx-2
 83:          0          0          0          0    9275611          0          0          0   PCI-MSI-edge      eth1-TxRx-3
 84:          0          0          0          0          0    9275611          0          0   PCI-MSI-edge      eth1-TxRx-4
 85:          0          0          0          0          0    9275611          0          0   PCI-MSI-edge      eth1-TxRx-5
 86:          0          0          0          0          0          0    9275611          0   PCI-MSI-edge      eth1-TxRx-6
 87:          0          0          0          0          0          0    9275611          0   PCI-MSI-edge      eth1-TxRx-7

We'd get this:

 79:          0          0          0          0          0          0          0          0   PCI-MSI-edge      eth1
 80:    9275611    9275611    9275611    9275611    9275611    9275611    9275611    9275611   PCI-MSI-edge      eth1-TxRx

3. /proc/irq/x/smp_affinity actually makes sense again.  It can be a
mask of which interrupts are active instead of being a degenerate case
in which only the lowest set bit is actually honoured.

4. Easier to manage for the device driver.  All it needs is to call
request_percpu_irq(...) instead of trying to figure out how many
threads/cores/numa nodes/... there are in the machine, and how many
other multi-interrupt devices there are; and thus how many interrupts
it should allocate.  That can be left to the interrupt core which at
least has a chance of getting it right.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux