On Thu, Jul 28, 2016 at 02:15:48PM +0800, Yun Wu (Abel) wrote: > Hi Neil et al, > > The question comes from commit 93ed801, in which a condition was added to > judge whether irqbalance needs to rescan. > > /* IRQ removed and reinserted, need restart or this will > * cause an overflow and IRQ won't be rebalanced again > */ > if (count < info->irq_count) { > need_rescan = 1; > break; > } > > This works well for most situations, but not all. During one SLEEP_INTERVAL, > when an IRQ is removed and reinserted like the above comment said, AND the > times of the IRQ being serviced after reinserted do become a larger number > than when unremoved, the IRQ can hardly be rebalanced again. Actually this > problem shows up very occasionally in my recent hotplug tests, but once > happened on performance-critical IRQs, it is undoubtedly a disaster. > > This problem can even be worse when the two IRQs, removed one and reinserted > one, belongs to different kind of devices, in which case wrong balance policies > might be used. > > To solve this problem, I think we can make efforts in two aspects: > (given the removed IRQ is A and the reinserted one is B) > a) If A != B, set need_rescan to 1. This can be achieved by comparing the > two IRQs' name string. > b) If A == B, we simply treat this as an modification on its affinity. An > unexpected modification on affinity can cause inconsistency between the > IRQ's real affinity and the affinity recorded inside irqbalance's data > structure, leading to inappropriate load calculation. > > I haven't yet figured out a proper way to solve the inconsistency, or is there > already a solution that I missed? > > Any comments are appreciated. > > Thanks, > Abel > > > Yeah, you look to be right. My first thought is to be heavy handed and use the listening interface on libudev to detect hotplug events, and just set need_rescan, anytime we get one. Thoughts? Neil