On Thu, Mar 20 2025 at 09:23, Thomas Gleixner wrote: > On Wed, Mar 19 2025 at 21:58, Wen Xiong wrote: >> We don't see the issue without dynamically remove/add operation. >> There is a small window which irqbalance daemon kicks in during device >> reset. So it took about over 6 hours to recreate the issue when doing >> remove/add loop operation. > > Sure. You need a loop to hit the window. And it does not matter whether > it's the probe or the remove which triggers it. Fact is that the reset > wipes out the config space, which means that any read from the config > space between reset and restore will return garbage. That problem is not > restricted to the interrupt code. It's a general problem. After looking at the code again, it's a problem in the remove() function: __ipr_remove() ipr_initiate_ioa_bringdown() // resets device restore_config_space() .... ipr_free_all_resources() free_irqs() So yes, it's not probe(). But the question is pretty much the same. Why is a reset issued while the driver is fully operational and resources are still in use? Don't even think about telling me that this is a problem of the MSI interrupt rework. It is not. It's been broken forever. You _cannot_ pull the rung under a fully operational driver and expect that all involved parts will just magically handle this gracefully. What about tearing down resources first and then issuing the reset? Thanks, tglx