On Tue, Nov 14, 2017 at 03:20:34PM -0500, Neil Horman wrote: > On Tue, Nov 14, 2017 at 07:20:33PM +0000, Vallish Vaidyeshwara wrote: > > We recently encountered a corner case bug in irqbalance where irqbalance > > was not balancing interrupts across 2 CPU's. > > > We've moved to github: > https://github.com/Irqbalance/irqbalance > > If you wouldn't mind opening a pull request or an issue for this, we can happily > take a look at it there > Hello Neil, I have opened a new issue to track this bug: https://github.com/Irqbalance/irqbalance/issues/61 Thanks. -Vallish > Best > Neil > > > The code snippet in question is: > > static void gather_load_stats(struct topo_obj *obj, void *data) > > { > > struct load_balance_info *info = data; > > > > if (info->min_load == 0 || obj->load < info->min_load) > > info->min_load = obj->load; > > info->total_load += obj->load; > > info->load_sources += 1; > > } > > > > The bug we encountered had 2 CPU's with load for the first CPU being 0. > > Eg: obj1->load = 0 and obj2->load = 5000. > > > > Iteration 1: obj1 is passed and info initialized to 0 is passed in > > as param. > > if (info->min_load == 0 || obj->load < info->min_load) > > info->min_load = obj->load; > > Because info->min_load is 0, info->min_load is set to obj->load which > > is also 0 in this case. > > > > Iteration 2: obj2 is passed with info having values set from previous > > iteration: > > info->min_load = 0, info->total_load = 0, info->load_sources = 1 > > > > if (info->min_load == 0 || obj->load < info->min_load) > > info->min_load = obj->load; > > Because of the logic used in gather_load_stats() as shown above, > > info->min_load gets set to obj2->load which is 5000. This is not the > > minimum load. Because of this bug, interrupts do not migrate as this > > value is checked in the function which migrates the interrupts. > > > > Test results: > > ------------- > > i3.large system - single socket, single core, hyperthreading enabled > > > > With fix: > > [ec2-user at ip-10-0-38-254 tmp]$ cat /proc/interrupts | grep "\<CPU0\>\|\<60\>\|\<61\>\|\<66\>\|\<67\>\|\<69\>\|\<70\>" > > CPU0 CPU1 > > 60: 1593819 292297 xen-pirq-msi-x nvme0q0, nvme0q1 > > 61: 456537 831877 xen-pirq-msi-x nvme0q2 > > 66: 1498 1644 xen-pirq-msi-x eth1-Tx-Rx-0 > > 67: 1535 1034 xen-pirq-msi-x eth1-Tx-Rx-1 > > 69: 3579919 8973629 xen-pirq-msi-x eth2-Tx-Rx-0 > > 70: 9265764 3747096 xen-pirq-msi-x eth2-Tx-Rx-1 > > > > Without fix: > > [ec2-user at ip-10-0-10-75 ~]$ cat /proc/interrupts | grep "\<CPU0\>\|\<60\>\|\<61\>\|\<66\>\|\<67\>\|\<69\>\|\<70\>" > > CPU0 CPU1 > > 60: 4014 1088758 xen-pirq-msi-x nvme0q0, nvme0q1 > > 61: 2046 2218741 xen-pirq-msi-x nvme0q2 > > 66: 11 2978 xen-pirq-msi-x eth1-Tx-Rx-0 > > 67: 8 43 xen-pirq-msi-x eth1-Tx-Rx-1 > > 69: 43 13773325 xen-pirq-msi-x eth2-Tx-Rx-0 > > 70: 4 13102621 xen-pirq-msi-x eth2-Tx-Rx-1 > > > > Vallish Vaidyeshwara (1): > > irqbalance: Fix min_load to pick actual min load across all objects > > > > irqlist.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > -- > > 2.7.5 > > > > > > _______________________________________________ > > irqbalance mailing list > > irqbalance at lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/irqbalance > >