On 2020-02-05 14:08, John Garry wrote:
On 03/02/2020 18:16, John Garry wrote:
On 03/02/2020 15:43, Marc Zyngier wrote:
On 2020-02-03 12:56, John Garry wrote:
[...]
Can you trigger it after disabling irqbalance?
No, so tested by killing the irqbalance process and it ran for 25
minutes without issue.
OK, that's interesting.
Can you find you whether irqbalance tries to move an interrupt to an
offlined CPU?
Just putting a trace into git_set_affinity() should be enough.
Just an update here: I have tried this same test on a new model dev
board and I don't experience the same issue. It's quite stable.
Is it the exact same SoC? Or a revised version?
I'd like to get to the bottom of the issue reported, but I feel that
the root cause may be a BIOS issue and I will get next to no BIOS
support for that particular board. Hmmm.
I'd very much like to understand it too. Your latest log is even more
puzzling,
as the backtrace shows a switch_to() even earlier... The fact that this
only
happens on hotplug off tends to tell me that the firmware gets confused
with
PSCI OFF. You'd see something like that if a CPU was taken into the
firmware
(or powered-off) without the rest of the kernel knowing...
Could it be that PSCI powers off more than a single CPU at once?
M.
--
Jazz is not dead. It just smells funny...