On 24.03.23 18:20, Jakub Kicinski wrote:
On Fri, 24 Mar 2023 18:13:14 +0100 Felix Fietkau wrote:
When dealing with few flows or an imbalance on CPU utilization, static RPS
CPU assignment can be too inflexible. Add support for enabling threaded NAPI
for backlog processing in order to allow the scheduler to better balance
processing. This helps better spread the load across idle CPUs.
Can you explain the use case a little bit more?
I'm primarily testing this on routers with 2 or 4 CPUs and limited
processing power, handling routing/NAT. RPS is typically needed to
properly distribute the load across all available CPUs. When there is
only a small number of flows that are pushing a lot of traffic, a static
RPS assignment often leaves some CPUs idle, whereas others become a
bottleneck by being fully loaded. Threaded NAPI reduces this a bit, but
CPUs can become bottlenecked and fully loaded by a NAPI thread alone.
Making backlog processing threaded helps split up the processing work
even more and distribute it onto remaining idle CPUs.
It can basically be used to make RPS a bit more dynamic and
configurable, because you can assign multiple backlog threads to a set
of CPUs and selectively steer packets from specific devices / rx queues
to them and allow the scheduler to take care of the rest.
- Felix