Hi guys, We are seeing some pretty significant degradation in some of our benchmarks when using 5.10.59-rt52 or 5.14.1-rt19 kernels versus previous numbers on 4.14-rt and I'm looking for thoughts/ideas if anything can be done about it. The synthetic benchmark in question is measuring the latency of a "analog in" -> "analog out" loop. We have two versions of this benchmark: one that is interrupt based and one that polls on the I/O. The polling mode benchmark is comparable between 5.x-rt and 4.14-rt. However the most extreme case of the interrupt based one shows up to 60% degradation in latency in 5.x-rt kernels. I was able to confirm that the issue is caused by contention on 'softirq_ctrl.lock' local lock in the softirq code. This results in event traces with the following pattern: <idle>-0 [000] 582.023481: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=ksoftirqd/0 next_pid=12 next_prio=91 ksoftirqd/0-12 [000] 582.023485: irq_handler_entry: irq=102 name=atomicchinchk ksoftirqd/0-12 [000] 582.023485: irq_handler_exit: irq=102 ret=handled ksoftirqd/0-12 [000] 582.023486: sched_waking: comm=irq/102-atomicc pid=248 prio=1 target_cpu=000 ksoftirqd/0-12 [000] 582.023488: sched_wakeup: comm=irq/102-atomicc pid=248 prio=1 target_cpu=000 ksoftirqd/0-12 [000] 582.023490: rcu_utilization: Start context switch ksoftirqd/0-12 [000] 582.023490: rcu_utilization: End context switch ksoftirqd/0-12 [000] 582.023493: sched_switch: prev_comm=ksoftirqd/0 prev_pid=12 prev_prio=91 prev_state=R+ ==> next_comm=irq/102-atomicc next_pid=248 next_prio=1 irq/102-atomicc-248 [000] 582.023496: sched_pi_setprio: comm=ksoftirqd/0 pid=12 oldprio=91 newprio=1 irq/102-atomicc-248 [000] 582.023499: rcu_utilization: Start context switch irq/102-atomicc-248 [000] 582.023499: rcu_utilization: End context switch irq/102-atomicc-248 [000] 582.023501: sched_switch: prev_comm=irq/102-atomicc prev_pid=248 prev_prio=1 prev_state=R ==> next_comm=ksoftirqd/0 next_pid=12 next_prio=1 ksoftirqd/0-12 [000] 582.023503: softirq_entry: vec=7 [action=SCHED] ksoftirqd/0-12 [000] 582.023508: softirq_exit: vec=7 [action=SCHED] ksoftirqd/0-12 [000] 582.023510: sched_pi_setprio: comm=ksoftirqd/0 pid=12 oldprio=1 newprio=91 ksoftirqd/0-12 [000] 582.023513: sched_waking: comm=irq/102-atomicc pid=248 prio=1 target_cpu=000 ksoftirqd/0-12 [000] 582.023515: sched_wakeup: comm=irq/102-atomicc pid=248 prio=1 target_cpu=000 ksoftirqd/0-12 [000] 582.023519: rcu_utilization: Start context switch ksoftirqd/0-12 [000] 582.023519: rcu_utilization: End context switch ksoftirqd/0-12 [000] 582.023521: sched_switch: prev_comm=ksoftirqd/0 prev_pid=12 prev_prio=91 prev_state=S ==> next_comm=irq/102-atomicc next_pid=248 next_prio=1 irq/102-atomicc-248 [000] 582.023530: sched_waking: comm=S81039292 pid=2240 prio=40 target_cpu=001 irq/102-atomicc-248 [000] 582.023532: sched_wake_idle_without_ipi: cpu=1 irq/102-atomicc-248 [000] 582.023533: sched_wakeup: comm=S81039292 pid=2240 prio=40 target_cpu=001 irq/102-atomicc-248 [000] 582.023539: rcu_utilization: Start context switch irq/102-atomicc-248 [000] 582.023540: rcu_utilization: End context switch irq/102-atomicc-248 [000] 582.023542: sched_switch: prev_comm=irq/102-atomicc prev_pid=248 prev_prio=1 prev_state=S ==> next_comm=swapper/0 next_pid=0 next_prio=120 The 'irq/102-atomicc-248' irq thread is our high priority data acquisition thread. The additional pi boost and context switch seems to account for the main differences I'm seeing versus 4.14-rt. This race + pi boost happens with other lower priority irq threads too but the ksoftirq case is the most common one. I would appreciate any thoughts on how/if we could improve this? Thanks, Gratian