On Tue, Dec 13, 2022 at 7:28 AM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > > On Mon, Dec 12, 2022 at 10:39:26AM -0800, Yonghong Song wrote: > > > > > > On 12/12/22 10:20 AM, Greg KH wrote: > > > On Mon, Dec 12, 2022 at 09:52:03AM -0800, Yonghong Song wrote: > > > > > > > > > > > > On 12/12/22 9:02 AM, Benjamin Tissoires wrote: > > > > > On Thu, Nov 3, 2022 at 4:58 PM Benjamin Tissoires > > > > > <benjamin.tissoires@xxxxxxxxxx> wrote: > > > > > > > > > > > > Kind of a hack, but works for now: > > > > > > > > > > > > Instead of listening for any close of eBPF program, we now > > > > > > decrement the refcount when we insert it in our internal > > > > > > map of fd progs. > > > > > > > > > > > > This is safe to do because: > > > > > > - we listen to any call of destructor of programs > > > > > > - when a program is being destroyed, we disable it by removing > > > > > > it from any RCU list used by any HID device (so it will never > > > > > > be called) > > > > > > - we then trigger a job to cleanup the prog fd map, but we overwrite > > > > > > the removal of the elements to not do anything on the programs, just > > > > > > remove the allocated space > > > > > > > > > > > > This is better than previously because we can remove the map of known > > > > > > programs and their usage count. We now rely on the refcount of > > > > > > bpf, which has greater chances of being accurate. > > > > > > > > > > > > Signed-off-by: Benjamin Tissoires <benjamin.tissoires@xxxxxxxxxx> > > > > > > > > > > > > --- > > > > > > > > > > So... I am a little bit embarrassed, but it turns out that this hack > > > > > is not safe enough. > > > > > > > > > > If I compile the kernel with LLVM=1, the function > > > > > bpf_prog_put_deferred() is optimized in a weird way: if we are not in > > > > > irq, the function is inlined into __bpf_prog_put(), but if we are, the > > > > > function is still kept around as it is called in a scheduled work > > > > > item. > > > > > > > > > > This is something I completely overlooked: I assume that if the > > > > > function would be inlined, the HID entrypoint BPF preloaded object > > > > > would not be able to bind, thus deactivating HID-BPF safely. But if a > > > > > function can be both inlined and not inlined, then I have no > > > > > guarantees that my cleanup call will be called. Meaning that a HID > > > > > device might believe there is still a bpf function to call. And things > > > > > will get messy, with kernel crashes and others. > > > > > > > > You should not rely fentry to a static function. This is unstable > > > > as compiler could inline it if that static function is called > > > > directly. You could attach to a global function if it is not > > > > compiled with lto. > > > > > > But now that the kernel does support LTO, how can you be sure this will > > > always work properly? The code author does not know if LTO will kick in > > > and optimize this away or not, that's the linker's job. > > > > Ya, that is right. So for in-kernel bpf programs, attaching to global > > functions are not safe either. For other not-in-kernel bpf programs, it > > may not work but that is user's responsibility to adjust properly > > (to different functions based on a particular build, etc.). > > So if in-kernel bpf programs will not work or are not safe, how will > in-kernel bpf programs properly attach? Sorry if that wasn't clear. Loading a bpf program from the kernel is fine and safe. But it was the use of it that wasn't. In my case, HID-BPF to fix devices is safe (whether the program is loaded from the kernel or from userspace): the bpf JIT/verifier ensures that there are no out of bound read/write and the API is properly defined. But the problem I am facing with the generic bpf implementation is that it is made to be a global processing and to attach to one given function, when I wanted to have a couple function + device. So in this patch, I actually abused BPF to get free event notifications when the bpf program was released. The first implementation (HID: initial BPF implementation) was safer than this patch because I was using BPF for notifications of my internals but I wasn't messing up with the reference count. So if I did not get the events, I wouldn't decrement the bpf_prog and the end result means that the bpf program would stay forever attached to the device. Not user friendly but it doesn't introduce a kernel crash. However, this patch is messing with reference counting of an internal kernel object assuming I would always get the event. This is not the case and so I get read after free errors. TL;DR: (ab)using BPF internally for kernel introspection to manage kernel structures is just plain wrong. Mea culpa. Cheers, Benjamin