On Fri, Dec 11, 2020 at 1:27 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Thu, Dec 10, 2020 at 08:27:32PM -0800, Andrii Nakryiko wrote: > > During BPF program load time, verifier will resolve FD to BTF object and will > > take reference on BTF object itself and, for module BTFs, corresponding module > > as well, to make sure it won't be unloaded from under running BPF program. The > > mechanism used is similar to how bpf_prog keeps track of used bpf_maps. > ... > > + > > + /* if we reference variables from kernel module, bump its refcount */ > > + if (btf_is_module(btf)) { > > + btf_mod->module = btf_try_get_module(btf); > > Is it necessary to refcnt the module? Correct me if I'm wrong, but > for module's BTF we register a notifier. Then the module can be rmmod-ed > at any time and we will do btf_put() for corresponding BTF, but that BTF may > stay around because bpftool or something is looking at it. Correct, BTF object itself doesn't take a refcnt on module. > Similarly when prog is attached to raw_tp in a module we currently do try_module_get(), > but is it really necessary ? When bpf is attached to a netdev the netdev can > be removed and the link will be dangling. May be it makes sense to do the same > with modules? The raw_tp can become dangling after rmmod and the prog won't be So for raw_tp it's not the case today. I tested, I attached raw_tp, kept triggering it in a loop, and tried to rmmod bpf_testmod. It failed, because raw tracepoint takes refcnt on module. rmmod -f bpf_testmod also didn't work, but it's because my kernel wasn't built with force-unload enabled for modules. But force-unload is an entirely different matter and it's inherently dangerous to do, it can crash and corrupt anything in the kernel. > executed anymore. So hard coded address of a per-cpu var in a ksym will > be pointing to freed mod memory after rmmod, but that's ok, since that prog will > never execute. Not so fast :) Indeed, if somehow module gets unloaded while we keep BPF program loaded, we'll point to unallocated memory **OR** to a memory re-used for something else. That's bad. Nothing will crash even if it's unmapped memory (due to bpf_probe_read semantics), but we will potentially be reading some garbage (not zeroes), if some other module re-uses that per-CPU memory. As for the BPF program won't be triggered. That's not true in general, as you mention yourself below. > On the other side if we envision a bpf prog attaching to a vmlinux function > and accessing per-cpu or normal ksym in some module it would need to inc refcnt > of that module, since we won't be able to guarantee that this prog will > not execute any more. So we cannot allow dangling memory addresses. That's what my new selftest is doing actually. It's a generic sys_enter raw_tp, which doesn't attach to the module, but it does read module's per-CPU variable. So I actually ran a test before posting. I successfully unloaded bpf_testmod, but kept running the prog. And it kept returning *correct* per-CPU value. Most probably due to per-CPU memory not unmapped and not yet reused for something else. But it's a really nasty and surprising situation. Keep in mind, also, that whenever BPF program declares per-cpu variable extern, it doesn't know or care whether it will get resolved to built-in vmlinux per-CPU variable or module per-CPU variable. Restricting attachment to only module-provided hooks is both tedious and might be quite surprising sometimes, seems not worth the pain. > If latter is what we want to allow then we probably need a test case for it and > document the reasons for keeping modules pinned while progs access their data. > Since such pinning behavior is different from other bpf attaching cases where > underlying objects (like netdev and cgroup) can go away. See above, that's already the case for module tracepoints. So in summary, I think we should take a refcnt on module, as that's already the case for stuff like raw_tp. I can add more comments to make this clear, of course.