On Fri, Nov 15, 2019 at 3:24 PM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote: > > On 11/15/19 5:02 AM, Andrii Nakryiko wrote: > > 92117d8443bc ("bpf: fix refcnt overflow") turned refcounting of bpf_map into > > potentially failing operation, when refcount reaches BPF_MAX_REFCNT limit > > (32k). Due to using 32-bit counter, it's possible in practice to overflow > > refcounter and make it wrap around to 0, causing erroneous map free, while > > there are still references to it, causing use-after-free problems. > > You mention 'it's possible in practice' to overflow in relation to bpf map > refcount, do we need any fixing for bpf tree here? I meant without existing 32k limit. With limit we currently have, no, we'll just fail bpf_map_inc instead. So no need to do anything about bpf tree. > > > But having a failing refcounting operations are problematic in some cases. One > > example is mmap() interface. After establishing initial memory-mapping, user > > is allowed to arbitrarily map/remap/unmap parts of mapped memory, arbitrarily > > splitting it into multiple non-contiguous regions. All this happening without > > any control from the users of mmap subsystem. Rather mmap subsystem sends > > notifications to original creator of memory mapping through open/close > > callbacks, which are optionally specified during initial memory mapping > > creation. These callbacks are used to maintain accurate refcount for bpf_map > > (see next patch in this series). The problem is that open() callback is not > > supposed to fail, because memory-mapped resource is set up and properly > > referenced. This is posing a problem for using memory-mapping with BPF maps. > > > > One solution to this is to maintain separate refcount for just memory-mappings > > and do single bpf_map_inc/bpf_map_put when it goes from/to zero, respectively. > > There are similar use cases in current work on tcp-bpf, necessitating extra > > counter as well. This seems like a rather unfortunate and ugly solution that > > doesn't scale well to various new use cases. > > > > Another approach to solve this is to use non-failing refcount_t type, which > > uses 32-bit counter internally, but, once reaching overflow state at UINT_MAX, > > stays there. This utlimately causes memory leak, but prevents use after free. > > [...] > > > > +/* prog's refcnt limit */ > > +#define BPF_MAX_REFCNT 32768 > > + > > Would it make sense in this context to also convert prog refcount over to atomic64 > so we have both in one go in order to align them again? This could likely simplify > even more wrt error paths. > Yes, of course, I was just trying to minimize amount of changes, but I can go and do the same for prog refcnt. > > struct bpf_prog *bpf_prog_add(struct bpf_prog *prog, int i) > > { > > if (atomic_add_return(i, &prog->aux->refcnt) > BPF_MAX_REFCNT) {