On Wed, Mar 19 2025 at 21:58, Wen Xiong wrote: >> The real problem has nothing to do with a remove/add operation. The >> problem is solely in the probe function. > > I don't think we have problems in probe function since this driver has > been in productions for many many years. Seriously? It does not matter at all whether you had it many years in production or not. Fact is that the driver is operational and after that a device reset happens, which wipes the config space. That _IS_ the problem. > Also we didn't see the issue before the "MSI domain" patchset dropping > into linux interrupt code(no issue in rhel92 release). That's completely irrelevant. See above. > Device reset is not called in probe function. Right. The reset is part of PCI error handling, which happens _AFTER_ the driver has set up interrupts. > We don't see the issue without dynamically remove/add operation. > There is a small window which irqbalance daemon kicks in during device > reset. So it took about over 6 hours to recreate the issue when doing > remove/add loop operation. Sure. You need a loop to hit the window. And it does not matter whether it's the probe or the remove which triggers it. Fact is that the reset wipes out the config space, which means that any read from the config space between reset and restore will return garbage. That problem is not restricted to the interrupt code. It's a general problem. > We can't find the good way to fix the issue in both of device drivers. > So we look for some help in interrupt code. No. This is _NOT_ a interrupt specific problem. You are observing the symptom related to interrupts, but any other code which reads from config space during the reset window has exactly the same problem. The PCI error handling resets the device asynchronously to any other operation which might access the config space. Yes, set_affinity() is one possible way to hit that due to the implementation detail of pseries_msi_compose_msg(), which reads the MSI message composed by the underlying hypervisor back from config space. But even if it would not read back and compose the message itself then set_affinity() would create inconsistent state because: reset() compose() write() restore() I.e. the reset machinery overwrites the new message, which means this ends up with inconsistent state. So this is a general problem with PCI error handling and _not_ a problem of the interrupt subsystem. I have no idea what to do about that, but this needs to be looked at from the PCI error handling side and not papered over at the messenger. Thanks, tglx