Re: Priority inversion for processes bound to a fixed CPU

Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> · Fri, 12 Nov 2021 15:54:52 +0100



On 2021-11-10 11:45:57 [+0100], Uwe Kleine-König wrote:
> Hello,
Hi Uwe,

> recently I debugged a problem on an -rt enabled kernel. The relevant
> part of the analysed trace looks as follows:
> 
>     napi/can0-10-360     [001] d...312  3565.642595: sched_pi_setprio: comm=candump pid=2182 oldprio=120 newprio=14
>     napi/can0-10-360     [001] d...212  3565.642619: sched_switch: prev_comm=napi/can0-10 prev_pid=360 prev_prio=14 prev_state=R ==> next_comm=cantest next_pid=915 next_prio=39
>     ....
> 	  rcuc/0-15      [000] d...212  3565.642633: sched_switch: prev_comm=rcuc/0 prev_pid=15 prev_prio=98 prev_state=R+ ==> next_comm=candump next_pid=2182 next_prio=14
> 	 candump-2182    [000] d...3..  3565.642646: sched_pi_setprio: comm=candump pid=2182 oldprio=14 newprio=120
> 
> So the napi/can0-10 wants to grab a mutex that candump is holding. So
> candump's priority is bumped from 120 to 14.
> 
> However the napi/can0-10 process (and a few others) are pinned to cpu #1
> and cantest isn't allowed to run on that one. And so cpu #1 schedules a
> lower prio task while candump still has to wait a moment before being
> scheduled on cpu #0.
> 
> I wonder if it would be sensible in such a case not only to increase the
> importance of candump, but also to allow it to run on the cpu-set the
> boosting process is allowed to run on until it releases the mutex.
> 
> Would that make sense?

Sounds like your problem could be solved by allowing candump to run on
any CPU. Why not lift that restriction yourself?
>From the trace, you have migration disabled for napi/can0. So that one
can't be moved. If the lock, that is owned by candump, is a spinlock_t
than candump has also migration disabled and can't be moved either.

> Best regards
> Uwe

Sebastian