On Thu, 2023-07-13 at 12:03 -0500, Bjorn Helgaas wrote: > On Thu, Jun 29, 2023 at 11:04:50PM -0400, Radu Rendec wrote: > > On Thu, 2023-06-29 at 17:11 -0500, Bjorn Helgaas wrote: > > ... > > > > If converting from chained to normal handlers can be done safely, I > > > would definitely be in favor if doing it across all of drivers/pci/ at > > > once so they're all consistent. Otherwise that code just gets copied > > > to new drivers, so the issue persists and spreads. > > > > I think the conversion can be done safely, meaning that it won't break > > the drivers. And by the way, there are other IRQ drivers (outside the > > PCI space) that use chained interrupts. > > > > Unfortunately, it seems we are going in circles. Chained interrupts are > > bad because they let IRQ storms go unnoticed and lock up the system, > > but converting them to regular interrupts is also bad because it breaks > > the userspace ABI. > > > > I am willing to help clean up this mess, but I think first we need to > > come up with a solution that's acceptable for everybody. I was hoping > > Marc and Thomas would chime in, but unfortunately that hasn't happened > > yet - other than each of them pointing out (separately) what is wrong > > with each approach. > > I don't think Marc or Thomas are going to chime in with a fully-formed > solution. I think to make progress, you (or Pali, or somebody) will > have to try to address Marc and Thomas' comments, make a proposal, and > we can iterate on it. That crossed my mind too. Unfortunately, Marc's and Thomas' comments are contradictory, or at least that's my interpretation. I don't expect them to come up with a fully-formed solution, but merely to agree upon something that the rest of us can follow. Otherwise, I think no matter what we may come up with, at least one of them will dismiss it. They made very clear points, and I understand both. I just can't see a common denominator. Let me elaborate a bit. Thomas made it very clear that we should get rid of chained interrupts altogether and suggested to use regular interrupts instead. And since all regular interrupts are visible in procfs by default, so is their affinity control interface. And with that you can now change the affinity of a parent interrupt and it will also affect the affinity of all child interrupts. That would break the promise that the procfs interface currently makes, which is that setting the affinity on an interrupt will affect *only* that particular interrupt and nothing else (that is Marc's point). The only solution that comes to mind is this: * Add support for tracking parent-child interrupt relationships. * Modify the existing procfs affinity control interface to reject changing the affinity of a parent interrupt (i.e. an interrupt that has at least one child interrupt associated). * Convert chained interrupts to regular interrupts as needed. * Create a new sysfs affinity control interface that allows setting the affinity of any interrupt, including parent interrupts. * Expose the parent-child interrupt relationships in sysfs, so any program that is aware of the new interface can go to the root interrupt to set the affinity. To be honest, I think this approach would make things even messier and more confusing than they are today. And I'm not even sure it would not break the procfs interface backwards compatibility in a different way. Of course, any comments or suggestions are welcome and would be appreciated! Best regards, Radu