Re: [PATCH net-next v2 0/3] net: introduce rps_default_mask

Jakub Kicinski <kuba@xxxxxxxxxx> · Wed, 4 Nov 2020 11:42:26 -0800

On Wed, 04 Nov 2020 18:36:08 +0100 Paolo Abeni wrote:
> On Tue, 2020-11-03 at 08:52 -0800, Jakub Kicinski wrote:
> > On Tue, 03 Nov 2020 16:22:07 +0100 Paolo Abeni wrote:  
> > > The relevant use case is an host running containers (with the related
> > > orchestration tools) in a RT environment. Virtual devices (veths, ovs
> > > ports, etc.) are created by the orchestration tools at run-time.
> > > Critical processes are allowed to send packets/generate outgoing
> > > network traffic - but any interrupt is moved away from the related
> > > cores, so that usual incoming network traffic processing does not
> > > happen there.
> > > 
> > > Still an xmit operation on a virtual devices may be transmitted via ovs
> > > or veth, with the relevant forwarding operation happening in a softirq
> > > on the same CPU originating the packet. 
> > > 
> > > RPS is configured (even) on such virtual devices to move away the
> > > forwarding from the relevant CPUs.
> > > 
> > > As Saeed noted, such configuration could be possibly performed via some
> > > user-space daemon monitoring network devices and network namespaces
> > > creation. That will be anyway prone to some race: the orchestation tool
> > > may create and enable the netns and virtual devices before the daemon
> > > has properly set the RPS mask.
> > > 
> > > In the latter scenario some packet forwarding could still slip in the
> > > relevant CPU, causing measurable latency. In all non RT scenarios the
> > > above will be likely irrelevant, but in the RT context that is not
> > > acceptable - e.g. it causes in real environments latency above the
> > > defined limits, while the proposed patches avoid the issue.
> > > 
> > > Do you see any other simple way to avoid the above race?
> > > 
> > > Please let me know if the above answers your doubts,  
> > 
> > Thanks, that makes it clearer now.
> > 
> > Depending on how RT-aware your container management is it may or may not
> > be the right place to configure this, as it creates the veth interface.
> > Presumably it's the container management which does the placement of
> > the tasks to cores, why is it not setting other attributes, like RPS?  
> 
> The container orchestration is quite complex, and I'm unsure isolation
> and networking configuration are performed (or can be performed) by the
> same precess (without an heavy refactor).
> 
> On the flip hand, the global rps mask knob looked quite
> straightforward to me.

I understand, but I can't shake the feeling this is a hack.

Whatever sets the CPU isolation should take care of the RPS settings.

> Possibly I can reduce the amount of new code introduced by this
> patchset removing some code duplication
> between rps_default_mask_sysctl() and flow_limit_cpu_sysctl(). Would
> that make this change more acceptable? Or should I drop this
> altogether?

I'm leaning towards drop altogether, unless you can get some
support/review tags from other netdev developers. So far it
appears we only got a down vote from Saeed.

> > Also I wonder if it would make sense to turn this knob into something
> > more generic. When we arrive at the threaded NAPIs - could it make
> > sense for the threads to inherit your mask as the CPUs they are allowed
> > to run on?  
> 
> I personally *think* this would be fine - and good. But isn't a bit
> premature discussing the integration of 2 missing pieces ? :)