On Fri, Mar 14, 2025 at 02:53:40PM +0000, Bernard Metzler wrote: > I assume the correct way forward is to first clarify the > structure of all user-visible objects that need to be > created/controlled/destroyed, and to route them through > this interface. Some will require extensions to given objects, > some may be new, some will be as-is. rdma_netlink will probably > be the right interface to look at for job control. As I understand the job ID model you will need to have some privileged entity to create a "job ID file descriptor" that can be passed around to unprivileged processes to grant them access to the job ID. This is necessary since the Job ID becomes part of the packet headers and we must secure userspace to prevent a hijack or spoof these values on the wire. Netlink has a major downside that you can't use filesystem ACL permissions to control access, so building a low privilege daemon just to do job id management seems to me to be more difficult. As an example, I would imagine having a job management char device with a filesystem ACL that only allows something like SLRUM's privileged orchestrator to talk to it. SLURM wouldn't have something like CAP_NET_ADMIN. SLURM would setup the job ID and pass the "Job ID FD" to the actual MPI workload processes to grant them permission to use those network headers. Nobody else in the system can create Job ID's besides SLURM, and in a multi-user environment one user cannot reach into the other and hijack their job ID because the FD does not leak outside the MPI process tree. This RFC doesn't describe the intended security model, but I'm very surprised to see ultraeth_nl_job_new_doit() not do any capability checks, or any security what so ever around access to the job. It should be obvious that this would be a fairly trivial add on to rdma, a new char dev, some rdma netlink to report and inspect the global job list, and a little driver helper to associate job FDs with uverbs objects and retrieve the job id. In fact we just did something similar with the UCAP system.. Further, jobs are not a concept unique to UE, I am seeing other RDMA scenarios talk about jobs now, perhaps inspired by UE. There is zero reason to make jobs a UE specific concept and not a general RDMA concept. Jason