Hi all, On Wed, 2020-11-04 at 12:42 -0700, Jakub Kicinski wrote: > On Wed, 04 Nov 2020 18:36:08 +0100 Paolo Abeni wrote: > > On Tue, 2020-11-03 at 08:52 -0800, Jakub Kicinski wrote: > > > On Tue, 03 Nov 2020 16:22:07 +0100 Paolo Abeni wrote: > > > > The relevant use case is an host running containers (with the related > > > > orchestration tools) in a RT environment. Virtual devices (veths, ovs > > > > ports, etc.) are created by the orchestration tools at run-time. > > > > Critical processes are allowed to send packets/generate outgoing > > > > network traffic - but any interrupt is moved away from the related > > > > cores, so that usual incoming network traffic processing does not > > > > happen there. > > > > > > > > Still an xmit operation on a virtual devices may be transmitted via ovs > > > > or veth, with the relevant forwarding operation happening in a softirq > > > > on the same CPU originating the packet. > > > > > > > > RPS is configured (even) on such virtual devices to move away the > > > > forwarding from the relevant CPUs. > > > > > > > > As Saeed noted, such configuration could be possibly performed via some > > > > user-space daemon monitoring network devices and network namespaces > > > > creation. That will be anyway prone to some race: the orchestation tool > > > > may create and enable the netns and virtual devices before the daemon > > > > has properly set the RPS mask. > > > > > > > > In the latter scenario some packet forwarding could still slip in the > > > > relevant CPU, causing measurable latency. In all non RT scenarios the > > > > above will be likely irrelevant, but in the RT context that is not > > > > acceptable - e.g. it causes in real environments latency above the > > > > defined limits, while the proposed patches avoid the issue. > > > > > > > > Do you see any other simple way to avoid the above race? > > > > > > > > Please let me know if the above answers your doubts, > > > > > > Thanks, that makes it clearer now. > > > > > > Depending on how RT-aware your container management is it may or may not > > > be the right place to configure this, as it creates the veth interface. > > > Presumably it's the container management which does the placement of > > > the tasks to cores, why is it not setting other attributes, like RPS? > > > > The container orchestration is quite complex, and I'm unsure isolation > > and networking configuration are performed (or can be performed) by the > > same precess (without an heavy refactor). > > > > On the flip hand, the global rps mask knob looked quite > > straightforward to me. > > I understand, but I can't shake the feeling this is a hack. > > Whatever sets the CPU isolation should take care of the RPS settings. Let me try for a moment to revive this old thread. Tha series proposed a new sysctl know to implement a global/default rps mask applying to all the network devices as a way to simplify some RT setups. It has been rejected as the required task is doable in user- space. Currently the orchestration infrastructure does that, setting the per device, per queue rps mask and CPU isolation. The above leads to a side problem: when there are lot of netns/devices with several queues, even a reasonably optimized user-space solution takes a relevant amount of time to traverse the relevant sysfs dirs and do I/O on them. Overall the additional time required is very measurable, easily ranging in seconds. The default_rps_mask would basically kill that overhead. Is the above a suitable use case? Thanks, Paolo