On Wed, 04 Nov 2020 18:36:08 +0100 Paolo Abeni wrote: > On Tue, 2020-11-03 at 08:52 -0800, Jakub Kicinski wrote: > > On Tue, 03 Nov 2020 16:22:07 +0100 Paolo Abeni wrote: > > > The relevant use case is an host running containers (with the related > > > orchestration tools) in a RT environment. Virtual devices (veths, ovs > > > ports, etc.) are created by the orchestration tools at run-time. > > > Critical processes are allowed to send packets/generate outgoing > > > network traffic - but any interrupt is moved away from the related > > > cores, so that usual incoming network traffic processing does not > > > happen there. > > > > > > Still an xmit operation on a virtual devices may be transmitted via ovs > > > or veth, with the relevant forwarding operation happening in a softirq > > > on the same CPU originating the packet. > > > > > > RPS is configured (even) on such virtual devices to move away the > > > forwarding from the relevant CPUs. > > > > > > As Saeed noted, such configuration could be possibly performed via some > > > user-space daemon monitoring network devices and network namespaces > > > creation. That will be anyway prone to some race: the orchestation tool > > > may create and enable the netns and virtual devices before the daemon > > > has properly set the RPS mask. > > > > > > In the latter scenario some packet forwarding could still slip in the > > > relevant CPU, causing measurable latency. In all non RT scenarios the > > > above will be likely irrelevant, but in the RT context that is not > > > acceptable - e.g. it causes in real environments latency above the > > > defined limits, while the proposed patches avoid the issue. > > > > > > Do you see any other simple way to avoid the above race? > > > > > > Please let me know if the above answers your doubts, > > > > Thanks, that makes it clearer now. > > > > Depending on how RT-aware your container management is it may or may not > > be the right place to configure this, as it creates the veth interface. > > Presumably it's the container management which does the placement of > > the tasks to cores, why is it not setting other attributes, like RPS? > > The container orchestration is quite complex, and I'm unsure isolation > and networking configuration are performed (or can be performed) by the > same precess (without an heavy refactor). > > On the flip hand, the global rps mask knob looked quite > straightforward to me. I understand, but I can't shake the feeling this is a hack. Whatever sets the CPU isolation should take care of the RPS settings. > Possibly I can reduce the amount of new code introduced by this > patchset removing some code duplication > between rps_default_mask_sysctl() and flow_limit_cpu_sysctl(). Would > that make this change more acceptable? Or should I drop this > altogether? I'm leaning towards drop altogether, unless you can get some support/review tags from other netdev developers. So far it appears we only got a down vote from Saeed. > > Also I wonder if it would make sense to turn this knob into something > > more generic. When we arrive at the threaded NAPIs - could it make > > sense for the threads to inherit your mask as the CPUs they are allowed > > to run on? > > I personally *think* this would be fine - and good. But isn't a bit > premature discussing the integration of 2 missing pieces ? :)