Re: Splat in kernel RT while processing incoming network packets

Wander Lairson Costa <wander@xxxxxxxxxx> · Wed, 5 Jul 2023 12:59:28 -0300

On Tue, Jul 04, 2023 at 04:47:49PM +0200, Sebastian Andrzej Siewior wrote:
> On 2023-07-04 12:29:33 [+0200], Paolo Abeni wrote:
> > Just to hopefully clarify the networking side of it, napi instances !=
> > network backlog (used by RPS). The network backlog (RPS) is available
> > for all the network devices, including the loopback and all the virtual
> > ones. 
> 
> Yes.
> 
> > The napi instances (and the threaded mode) are available only on
> > network device drivers implementing the napi model. The loopback driver
> > does not implement the napi model, as most virtual devices and even
> > some H/W NICs (mostily low end ones).
> 
> Yes.
> 
> > The network backlog can't run in threaded mode: there is no API/sysctl
> > nor infrastructure for that. The backlog processing threaded mode could
> > be implemented, even if should not be completely trivial and it sounds
> > a bit weird to me.
> 
> Yes, I mean that this needs to be done.
> 
> > 
> > Just for the records, I mentioned the following in the bz:
> > 
> > It looks like flush_smp_call_function_queue() has 2 only callers,
> > migration, and do_idle().
> > 
> > What about moving softirq processing from
> > flush_smp_call_function_queue() into cpu_stopper_thread(), outside the
> > unpreemptable critical section?
> 
> This doesn't solve anything. You schedule softirq from hardirq and from
> this moment on you are in "anonymous context" and we solve this by
> processing it in ksoftirqd.
> For !RT you process it while leaving the hardirq. For RT, we can't.
> Processing it in the context of the currently running process (say idle
> as in the reported backtrace or an another running user task) would lead
> to processing network related that originated somewhere at someone
> else's expense. Assume you have a high prio RT task running, not related
> to networking at all, and suddenly you throw a bunch of skbs on it.
> 
> Therefore it is preferred to process them within the interrupt thread in
> which the softirq was raised/ within its origin.
> 
> The other problem with ksoftirqd processing is that everything is added
> to a global state and then left for ksoftirqd to process. The global
> state is considered by every local_bh_enable() instance so random
> interrupt thread could process it or even a random task doing a syscall
> involving spin_lock_bh().
> 
> The NAPI-threads are nice in a way that they don't clobber the global
> state.
> For RPS we would need either per-CPU threads or serve this in
> ksoftirqd/X. The additional thread per-CPU makes only sense if it runs
> at higher priority. However without the priority it would be no
> different to ksoftirqd unless it does only the backlog's work.
> 
> puh. I'm undecided here. We might want to throw it into ksoftirqd,
> remove the warning. But then this will be processed with other softirqs
> (like USB due to tasklet) and at some point and might be picked up by
> another interrupt thread.
> 

Maybe, under RT, some softirq should run in the context of the "target"
process. For NET_RX, for example, the softirq's would run in the context
of the packet recipient process. Each task_struct would have a list of
pending softirq, which would be checked in a few points, like on scheduling,
when the process enters in the kernel, softirq raise, etc. The default
target process would be ksoftirqd. Does this idea make sense?

> > Cheers,
> > 
> > Paolo
> 
> Sebastian
>