Ok I actually have started an opensource project that may make use of the onshot interface. This is a bridging tool between two RDMA protocols called ib2roce. See https://gentwo.org/christoph/2022-bridging-rdma.pdf The relevant code can be found at https://github.com/clameter/rdma-core/tree/ib2roce/ib2roce. In particular look at the ib2roce.c source code. This is still under development. The ib2roce briding can run in a busy loop mode (-k option) where it spins on ibv_poll_cq() which is an RDMA call to handle incoming packets without kernel interaction. See busyloop() in ib2roce.c Currently I have configured the system to use CONFIG_NOHZ_FULL. With that I am able to reliably forward packets at a rate that saturates 100G Ethernet / EDR Infiniband from a single spinning thread. Without CONFIG_NOHZ_FULL any slight disturbance causes the forwarding to fall behind which will lead to dramatic packet loss since we are looking here at a potential data rate of 12.5Gbyte/sec and about 12.5Mbyte per msec. If the kernel interrupts the forwarding by say 10 msecs then we are falling behind by 125MB which would have to be buffered and processing by additional codes. That complexity makes it processing packets much slower which could cause the forwarding to slow down so that a recovery is not possible should the data continue to arrive at line rate. Isolation of the threads was done through the following kernel parameters: nohz_full=8-15,24-31 rcu_nocbs=8-15,24-31 poll_spectre_v2=off numa_balancing=disable rcutree.kthread_prio=3 intel_pstate=disable nosmt And systemd was configured with the following affinites: system.conf:CPUAffinity=0-7,16-23 This means that the second socket will be generally free of tasks and kernel threads. The NUMA configuration: $ numactl --hardware available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 node 0 size: 94798 MB node 0 free: 92000 MB node 1 cpus: 8 9 10 11 12 13 14 15 node 1 size: 96765 MB node 1 free: 96082 MB node distances: node 0 1 0: 10 21 1: 21 10 I could modify busyloop() in ib2roce.c to use the oneshot mode via prctl provided by this patch instead of the NOHZ_FULL. What kind of metric could I be using to show the difference in idleness of the quality of the cpu isolation? The ib2roce tool already has a CLI mode where one can monitor the latencies that the busyloop experiences. See the latency calculations in busyloop() and the CLI command "core". Stats can be reset via the "zap" command. I can see the usefulness of the oneshot mode but (I am very very sorry) I still think that this patchset overdoes what is needed and I fail to understand what the point of inheritance, per syscall quiescint etc is. Those cause needless overhead in syscall handling and increase the complexity of managing a busyloop. Special handling when the scheduler switches a task? If tasks are being switched that requires them to be low latency and undisturbed then something went very very wrong with the system configuration and the only thing I would suggest is to issue some kernel warning that this is not the way one should configure the system.