On Fri, Aug 07, 2020 at 09:06:50AM -0300, Jason Gunthorpe wrote: > On Thu, Aug 06, 2020 at 10:21:11PM +0200, Thomas Gleixner wrote: > > > Optionally? Please tell the hardware folks to make this mandatory. We > > have enough pain with non maskable MSI interrupts already so introducing > > yet another non maskable interrupt trainwreck is not an option. > > Can you elaborate on the flows where Linux will need to trigger > masking? > > I expect that masking will be available in our NIC HW too - but it > will require a spin loop if masking has to be done in an atomic > context. > > > It's more than a decade now that I tell HW people not to repeat the > > non-maskable MSI failure, but obviously they still think that > > non-maskable interrupts are a brilliant idea. I know that HW folks > > believe that everything they omit can be fixed in software, but they > > have to finally understand that this particular issue _cannot_ be fixed > > at all. > > Sure, the CPU should always be able to shut off an interrupt! > > Maybe explaining the goals would help understand the HW perspective. > > Today HW can process > 100k queues of work at once. Interrupt delivery > works by having a MSI index in each queue's metadata and the interrupt > indirects through a MSI-X table on-chip which has the > addr/data/mask/etc. > > What IMS proposes is that the interrupt data can move into the queue > meta data (which is not required to be on-chip), eg along side the > producer/consumer pointers, and the central MSI-X table is not > needed. This is necessary because the PCI spec has very harsh design > requirements for a MSI-X table that make scaling it prohibitive. > > So an IRQ can be silenced by deleting or stopping the queue(s) > triggering it. It can be masked by including masking in the queue > metadata. We can detect pending by checking the producer/consumer > values. > > However synchronizing all the HW and all the state is now more > complicated than just writing a mask bit via MMIO to an on-die memory. Because doing all of the work that used to be done in HW in software is so much faster and scalable? Feels really wrong to me :( Do you all have a pointer to the spec for this newly proposed stuff anywhere to try to figure out how the HW wants this to all work? thanks, greg k-h