On 3/19/25 15:52, Jason Gunthorpe wrote: > On Fri, Mar 14, 2025 at 02:53:40PM +0000, Bernard Metzler wrote: > >> I assume the correct way forward is to first clarify the >> structure of all user-visible objects that need to be >> created/controlled/destroyed, and to route them through >> this interface. Some will require extensions to given objects, >> some may be new, some will be as-is. rdma_netlink will probably >> be the right interface to look at for job control. > > As I understand the job ID model you will need to have some privileged > entity to create a "job ID file descriptor" that can be passed around > to unprivileged processes to grant them access to the job ID. This is > necessary since the Job ID becomes part of the packet headers and we > must secure userspace to prevent a hijack or spoof these values on the > wire. > > Netlink has a major downside that you can't use filesystem ACL > permissions to control access, so building a low privilege daemon just > to do job id management seems to me to be more difficult. > > As an example, I would imagine having a job management char device > with a filesystem ACL that only allows something like SLRUM's > privileged orchestrator to talk to it. SLURM wouldn't have something > like CAP_NET_ADMIN. SLURM would setup the job ID and pass the "Job ID > FD" to the actual MPI workload processes to grant them permission to > use those network headers. > > Nobody else in the system can create Job ID's besides SLURM, and in a > multi-user environment one user cannot reach into the other and hijack > their job ID because the FD does not leak outside the MPI process > tree. > > This RFC doesn't describe the intended security model, but I'm very > surprised to see ultraeth_nl_job_new_doit() not do any capability > checks, or any security what so ever around access to the job. > It doesn't need to do any capability checking because it is defined in the YAML model, there you can see flags: [ admin-perm ] so in the genl ops code that is automatically generated we get .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO for these ops, which in turn means the genetlink code will check if the caller has CAP_NET_ADMIN. The unprivileged process can request to associate with multiple jobs and it's the privileged process that has to configure and control them. In this version we have only configuration. Once the specs become publicly available we will be able to share more information about how it's expected to work. Cheers, Nik