On 2023-07-03 09:47:26 [-0300], Wander Lairson Costa wrote: > Dear all, Hi, > I am writing to report a splat issue we encountered while running the > Real-Time (RT) kernel in conjunction with Network RPS (Receive Packet > Steering). > > During some testing of the RT kernel version 6.4.0 with Network RPS enabled, > we observed a splat occurring in the SoftIRQ subsystem. The splat message is as > follows: > > [ 37.168920] ------------[ cut here ]------------ > [ 37.168925] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:291 do_softirq_post_smp_call_flush+0x2d/0x60 … > [ 37.169060] ---[ end trace 0000000000000000 ]--- > > It comes from [1]. > > The issue lies in the mechanism of RPS to defer network packets processing to > other CPUs. It sends an IPI to the to the target CPU. The registered callback > is rps_trigger_softirq, which will raise a softirq, leading to the following > scenario: > > CPU0 CPU1 > | netif_rx() | > | | enqueue_to_backlog(cpu=1) | > | | | net_rps_send_ipi() | > | | flush_smp_call_function_queue() > | | | was_pending = local_softirq_pending() > | | | __flush_smp_call_function_queue() > | | | rps_trigger_softirq() > | | | | __raise_softirq_irqoff() > | | | do_softirq_post_smp_call_flush() > > That has the undesired side effect of raising a softirq in a function call, > leading to the aforementioned splat. correct. > The kernel version is kernel-ark [1], os-build-rt branch. It is essentially the > upstream kernel with the PREEMPT_RT patches, and with RHEL configs. I can > provide the .config. It is fine, I see it. > The only solution I imagined so far was to modify RPS to process packtes in a > kernel thread in RT. But I wonder how would be that be different than processing > them in ksoftirqd. > > Any inputs on the issue? Not sure how to proceed. One thing you could do is a hack similar like net-Avoid-the-IPI-to-free-the.patch which does it for defer_csd. On the other hand we could drop net-Avoid-the-IPI-to-free-the.patch and remove the warning because we have now commit d15121be74856 ("Revert "softirq: Let ksoftirqd do its job"") Prior that, raising softirq from hardirq would wake ksoftirqd which in turn would collect all pending softirqs. As a consequence all following softirqs (networking, …) would run as SCHED_OTHER and compete with SCHED_OTHER tasks for resources. Not good because the networking work is no longer processed within the networking interrupt thread. Also not a DDoS kind of situation where one could want to delay processing. With that change, this isn't the case anymore. Only an "unrelated" IRQ thread could pick up the networking work which is less then ideal. That is because the global softirq set is added, ksoftirq is marked for a wakeup and could be delayed because other tasks are busy. Then the disk interrupt (for instance) could pick it up as part of its threaded interrupt. Now that I think about, we could make the backlog pseudo device a thread. NAPI threading enables one thread but here we would need one thread per-CPU. So it would remain kind of special. But we would avoid clobbering the global state and delay everything to ksoftird. Processing it in ksoftirqd might not be ideal from performance point of view. > [1] https://elixir.bootlin.com/linux/latest/source/kernel/softirq.c#L306 > > Cheers, > Wander Sebastian