On 11/17/19 6:17 PM, Andrii Nakryiko wrote:
On Sun, Nov 17, 2019 at 4:07 AM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote:
On 11/17/19 6:57 AM, Andrii Nakryiko wrote:
On Fri, Nov 15, 2019 at 5:18 PM Alexei Starovoitov <ast@xxxxxx> wrote:
On 11/15/19 4:13 PM, Daniel Borkmann wrote:
Yeah, only for fd array currently. Question is, if we ever reuse that
map_release_uref
callback in future for something else, will we remember that we earlier
missed to add
it here? :/
What do you mean 'missed to add' ?
Was saying missed to add the inc/put for the uref counter.
This is mmap path. Anything that needs releasing (like FDs for
prog_array or progs for sockmap) cannot be mmap-able.
Right, I meant if in future we ever have another use case outside of it
for some reason (unrelated to those maps you mention above). Can we
guarantee this is never going to happen? Seemed less fragile at least to
maintain proper count here.
I don't think we'll ever going to allow mmaping anything that contains
not just pure data. E.g., we disallow mmaping array that contains spin
lock for that reason. So I think it's safe to assume that this is not
going to happen even for future maps. At least not without some
serious considerations before that. So I'm going to keep it as just
plain bpf_map_inc for now.
Fair enough, then keep it as it is. The purpose of the uref counter is to
track whatever map holds a reference either in form of fd or inode in bpf
fs which are the only two things till now where user space can refer to the
map, and once it hits 0, we perform the map's map_release_uref() callback.
To be honest, I don't exactly understand why we need both refcnt and
usercnt. Does it have anything to do with some circular dependencies
for those maps containing other FDs? And once userspace doesn't have
any more referenced, we release FDs, which might decrement refcnt,
thus breaking circular refcnt between map FD and special FDs inside a
map? Or that's not the case and there is a different reason?
Yes, back then we added it to break up circular dependencies in relation to
tail calls which is why these are pinned before the loader process finishes
(e.g. as in Cilium's case).
Either way, I looked at map creation and bpf_map_release()
implementation again. map_create() does set usercnt to 1, and
bpf_map_release(), which I assume is called when map FD is closed,
does decrement usercnt, so yeah, we do with_uref() manipulations for
cases when userspace maintains some sort of handle to map. In that
regard, mmap() does fall into the same category, so I'm going to
convert everything mmap-related back to
bpf_map_inc_with_uref()/bpf_map_put_with_uref(), to stay consistent.
Ok, fair enough, either way would have been fine.
Thanks a lot,
Daniel