Re: [PATCH hid v12 05/15] HID: bpf jmp table: simplify the logic of cleaning up programs

Benjamin Tissoires <benjamin.tissoires@xxxxxxxxxx> · Tue, 13 Dec 2022 08:59:02 +0100

On Tue, Dec 13, 2022 at 7:28 AM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On Mon, Dec 12, 2022 at 10:39:26AM -0800, Yonghong Song wrote:
> >
> >
> > On 12/12/22 10:20 AM, Greg KH wrote:
> > > On Mon, Dec 12, 2022 at 09:52:03AM -0800, Yonghong Song wrote:
> > > >
> > > >
> > > > On 12/12/22 9:02 AM, Benjamin Tissoires wrote:
> > > > > On Thu, Nov 3, 2022 at 4:58 PM Benjamin Tissoires
> > > > > <benjamin.tissoires@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > Kind of a hack, but works for now:
> > > > > >
> > > > > > Instead of listening for any close of eBPF program, we now
> > > > > > decrement the refcount when we insert it in our internal
> > > > > > map of fd progs.
> > > > > >
> > > > > > This is safe to do because:
> > > > > > - we listen to any call of destructor of programs
> > > > > > - when a program is being destroyed, we disable it by removing
> > > > > >     it from any RCU list used by any HID device (so it will never
> > > > > >     be called)
> > > > > > - we then trigger a job to cleanup the prog fd map, but we overwrite
> > > > > >     the removal of the elements to not do anything on the programs, just
> > > > > >     remove the allocated space
> > > > > >
> > > > > > This is better than previously because we can remove the map of known
> > > > > > programs and their usage count. We now rely on the refcount of
> > > > > > bpf, which has greater chances of being accurate.
> > > > > >
> > > > > > Signed-off-by: Benjamin Tissoires <benjamin.tissoires@xxxxxxxxxx>
> > > > > >
> > > > > > ---
> > > > >
> > > > > So... I am a little bit embarrassed, but it turns out that this hack
> > > > > is not safe enough.
> > > > >
> > > > > If I compile the kernel with LLVM=1, the function
> > > > > bpf_prog_put_deferred() is optimized in a weird way: if we are not in
> > > > > irq, the function is inlined into __bpf_prog_put(), but if we are, the
> > > > > function is still kept around as it is called in a scheduled work
> > > > > item.
> > > > >
> > > > > This is something I completely overlooked: I assume that if the
> > > > > function would be inlined, the HID entrypoint BPF preloaded object
> > > > > would not be able to bind, thus deactivating HID-BPF safely. But if a
> > > > > function can be both inlined and not inlined, then I have no
> > > > > guarantees that my cleanup call will be called. Meaning that a HID
> > > > > device might believe there is still a bpf function to call. And things
> > > > > will get messy, with kernel crashes and others.
> > > >
> > > > You should not rely fentry to a static function. This is unstable
> > > > as compiler could inline it if that static function is called
> > > > directly. You could attach to a global function if it is not
> > > > compiled with lto.
> > >
> > > But now that the kernel does support LTO, how can you be sure this will
> > > always work properly?  The code author does not know if LTO will kick in
> > > and optimize this away or not, that's the linker's job.
> >
> > Ya, that is right. So for in-kernel bpf programs, attaching to global
> > functions are not safe either. For other not-in-kernel bpf programs, it
> > may not work but that is user's responsibility to adjust properly
> > (to different functions based on a particular build, etc.).
>
> So if in-kernel bpf programs will not work or are not safe, how will
> in-kernel bpf programs properly attach?

Sorry if that wasn't clear. Loading a bpf program from the kernel is
fine and safe. But it was the use of it that wasn't.

In my case, HID-BPF to fix devices is safe (whether the program is
loaded from the kernel or from userspace): the bpf JIT/verifier
ensures that there are no out of bound read/write and the API is
properly defined. But the problem I am facing with the generic bpf
implementation is that it is made to be a global processing and to
attach to one given function, when I wanted to have a couple function
+ device.

So in this patch, I actually abused BPF to get free event
notifications when the bpf program was released.
The first implementation (HID: initial BPF implementation) was safer
than this patch because I was using BPF for notifications of my
internals but I wasn't messing up with the reference count. So if I
did not get the events, I wouldn't decrement the bpf_prog and the end
result means that the bpf program would stay forever attached to the
device. Not user friendly but it doesn't introduce a kernel crash.

However, this patch is messing with reference counting of an internal
kernel object assuming I would always get the event. This is not the
case and so I get read after free errors.

TL;DR: (ab)using BPF internally for kernel introspection to manage
kernel structures is just plain wrong. Mea culpa.

Cheers,
Benjamin