On 2025-01-29 13:04:15 [+0100], Pavel Pisa wrote: > Hello Sebastian, Hi Pavel, > The actual one show none IRQ work interrupts > after last reboot and overnigh test > > Linux mzapo 6.13.0-rc6-rt3-dut #1 SMP PREEMPT_RT > Wed Jan 29 04:46:40 CET 2025 armv7l GNU/Linux … > CPU0 CPU1 > 48: 314697 0 GIC-0 61 Level can2 > 49: 314597 0 GIC-0 62 Level can3 > 50: 314759 0 GIC-0 63 Level can4 > 51: 311516 0 GIC-0 64 Level can5 > IPI0: 0 0 CPU wakeup interrupts > IPI1: 0 0 Timer broadcast interrupts > IPI2: 17849 292126 Rescheduling interrupts > IPI3: 5923 11315 Function call interrupts > IPI4: 0 0 CPU stop interrupts > IPI5: 271078 74040 IRQ work interrupts > IPI6: 0 0 completion interrupts > Err: 0 > > So this seems as no cause. None you say? I see 271078 on CPU0 and 74040 on the other one. > Yes, I think that design mixing regular networking packet > processing with CAN is the problem. We test even with setup where > CAN interrupts priority is boosted to 90 > > echo "-> Rise CAN irq priorities" > PIDS=$(ps -e | grep -E irq/[0-9]+-can[3-4] | tr -s ' ' | cut -d ' ' -f2) > TXPID=$(ps -e | grep -E irq/[0-9]+-can2 | tr -s ' ' | cut -d ' ' -f2) > chrt -f --pid 80 $TXPID > for pid in $PIDS ; do > chrt -f --pid 85 $pid > done but boosting the prio does not help because lock contention leads to PI and forces its way through. The problem is that networking will continue. You need to go to /proc/irq/${can_irq} and push the affinity to CPU1. > Even this setup is problematic under load. I would expect no change. > to run daily testing. We can consider even something different, > but this choice has been given by interest in something > functional for each day and ahead of mainline merges to > catch some problems in advance. Oh okay. > It is interesting than in kernel gateway is significantly worse > now. It does not overhead of switching to userspace. But I am not > sure if it is not invoked in some kernel worker which > has lower or same real time priority than Ethenet networking. > > In general, I think that the problem is that incommin > packets (CAN and Ethernet) load the same per CPU > worker. There are even backlog_napi threads per CPU > > 46 TS - S ? 00:00:00 [backlog_napi/0] > 47 TS - S ? 00:00:00 [backlog_napi/1] > > It has even TS priority. If I remember well, there has been > added option to allocate separate RX packets processing > therad (instead for default per CPU one) for given interface. > But I have no experience with such configuration. backlog NAPI is used by devices which don't bring their own NAPI. > Do you have or somebody else have idea how to achieve > that and if it is legal to boost such kernel therad > priority. It could help, because my general experience > with PREEMPT_RT even on this target is very positive > for tasks mapping HW directly and doing RT control. > Same for latency tester. No spikes under load over > 250 usec or less. I wouldn't boost it unconditionally. If you enable tracing with sched_switch, interrupts and maybe net then you should see how the flow of the CAN skb is. I don't know if it touches backlog_napi. Ideally it shouldn't. There shouldn't be anything that could interfere with it such ethernet traffic (say ssh) or local sockets. Once you see the regular flow you should be able to what blocks it once you step the trace during a spike. > Best wishes, > > Pavel Sebastian