On 8/2/21 8:37 AM, Michael Ellerman wrote: > Cédric Le Goater <clg@xxxxxxxx> writes: >> On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at >> runtime. Today, the IPI is not created for such nodes, and hot-plugged >> CPUs use a bogus IPI, which leads to soft lockups. >> >> We could create the node IPI on demand but it is a bit complex because >> this code would be called under bringup_up() and some IRQ locking is >> being done. The simplest solution is to create the IPIs for all nodes >> at startup. >> >> Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node") >> Cc: stable@xxxxxxxxxxxxxxx # v5.13 >> Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@xxxxxxx> >> Cc: Srikar Dronamraju <srikar@xxxxxxxxxxxxxxxxxx> >> Signed-off-by: Cédric Le Goater <clg@xxxxxxxx> >> --- >> >> This patch breaks old versions of irqbalance (<= v1.4). Possible nodes >> are collected from /sys/devices/system/node/ but CPU-less nodes are >> not listed there. When interrupts are scanned, the link representing >> the node structure is NULL and segfault occurs. > > Breaking userspace is usually frowned upon, even if it is irqbalance. > > If CPU-less nodes appeared in /sys/devices/system/node would that fix > it? Could we do that or is that not possible for other reasons? > >> Version 1.7 seems immune. > > Which was released in August 2020. > > Looks like some distros still ship 1.6, I take it you're not sure if > that is broken or not. I did a bisect on irqbalance and the "bad" commit was introduced between version 1.7 and version 1.8 : commit 31dea01f3a47 ("Also fetch node info for non-PCI devices") https://github.com/Irqbalance/irqbalance/commit/31dea01f3a47aa6374560638486879e5129f9c94 which was backported on RHEL 8 in RPM irqbalance-1.4.0-6.el8. Any distro using irqbalance <= 1.7 without the patch above is fine. Since irqbalance handled cleanly irqs referencing offline nodes before this patch, I am inclined to think that the irqbalance fix is incomplete. Unfortunately, the commit log lacks some context on the non-PCI devices. Thanks, C.