Hi Al, On Jun 03 2024, Al Viro wrote: > static int hid_bpf_insert_prog(int prog_fd, struct bpf_prog *prog) > { > int i, index = -1, map_fd = -1, err = -EINVAL; > > /* retrieve a fd of our prog_array map in BPF */ > map_fd = skel_map_get_fd_by_id(jmp_table.map->id); > > ... > > /* insert the program in the jump table */ > err = skel_map_update_elem(map_fd, &index, &prog_fd, 0); > if (err) > goto out; > > ... > > if (err < 0) > __hid_bpf_do_release_prog(map_fd, index); > if (map_fd >= 0) > close_fd(map_fd); > return err; > } > > What. The. Hell? TL;DR: I'm getting rid of that code in (hopefully v6.11), so any bad justification I can come up with can safely be ignored. > > Folks, descriptor table is a shared object. It is NOT safe to use > as a scratchpad. > > Another thread might do whatever it bloody wants to anything inserted > there. It may close your descriptor, it may replace it with something > entirely unrelated to what you've placed there, etc. > > This is fundamentally broken. The same goes for anything that tries > to play similar games. Don't use descriptor tables that way. > > Kernel-side descriptors should be used only for marshalling - they can > be passed by userland (and should be resolved to struct file *, with > no expectations that repeated call of fget() would yield the same > pointer) and they can be returned _to_ userland - after you've allocated > them and associated them with struct file *. I think the situation is not as bad as you think it is. First these fds are not actually struct file * but bpf objects. So the users of them should be rather small. Threading in this case is probably fine because this only happens under a mutex lock. And last, and I think that's the biggest point here, we are not really "in the kernel". The code is executed in the kernel, but it's coming from a syscall bpf program, and as such we are in a blurry state between userspace and kernel space. Then, the bpf API uses fds because we are effectively going through the userspace API from within the kernel to populate the maps. At any point here we don't assume the fd is safe. Whenever the call to skel_map_update_elem() is done, the kernel emits an equivalent of a syscall to this kernel function, and as such the actual kernel code behind it treats everything like a kernel code should. Efectively, this use of fds is to pass information from/to userland :) But I agree, the whole logic of this file is weird, and not clean. Luckily, with the help of the bpf folks, I came to a better solution, with bpf_struct_ops[0]. > > Using them as handles for internal objects is an equivalent of playing > in the traffic - think of it as evolution in action. Which now raises the question: should I ask for the bpf_struct_ops conversion to be backported in stable, because that code is too ugly? Cheers, Benjamin [0] https://lore.kernel.org/bpf/20240528-hid_bpf_struct_ops-v1-0-8c6663df27d8@xxxxxxxxxx/