Re: [PATCH v4 bpf-next 2/4] bpf: add mmap() support for BPF_MAP_TYPE_ARRAY

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Nov 17, 2019 at 4:07 AM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote:
>
> On 11/17/19 6:57 AM, Andrii Nakryiko wrote:
> > On Fri, Nov 15, 2019 at 5:18 PM Alexei Starovoitov <ast@xxxxxx> wrote:
> >> On 11/15/19 4:13 PM, Daniel Borkmann wrote:
> >>>>> Yeah, only for fd array currently. Question is, if we ever reuse that
> >>>>> map_release_uref
> >>>>> callback in future for something else, will we remember that we earlier
> >>>>> missed to add
> >>>>> it here? :/
> >>>>
> >>>> What do you mean 'missed to add' ?
> >>>
> >>> Was saying missed to add the inc/put for the uref counter.
> >>>
> >>>> This is mmap path. Anything that needs releasing (like FDs for
> >>>> prog_array or progs for sockmap) cannot be mmap-able.
> >>>
> >>> Right, I meant if in future we ever have another use case outside of it
> >>> for some reason (unrelated to those maps you mention above). Can we
> >>> guarantee this is never going to happen? Seemed less fragile at least to
> >>> maintain proper count here.
> >
> > I don't think we'll ever going to allow mmaping anything that contains
> > not just pure data. E.g., we disallow mmaping array that contains spin
> > lock for that reason. So I think it's safe to assume that this is not
> > going to happen even for future maps. At least not without some
> > serious considerations before that. So I'm going to keep it as just
> > plain bpf_map_inc for now.
>
> Fair enough, then keep it as it is. The purpose of the uref counter is to
> track whatever map holds a reference either in form of fd or inode in bpf
> fs which are the only two things till now where user space can refer to the
> map, and once it hits 0, we perform the map's map_release_uref() callback.

To be honest, I don't exactly understand why we need both refcnt and
usercnt. Does it have anything to do with some circular dependencies
for those maps containing other FDs? And once userspace doesn't have
any more referenced, we release FDs, which might decrement refcnt,
thus breaking circular refcnt between map FD and special FDs inside a
map? Or that's not the case and there is a different reason?

Either way, I looked at map creation and bpf_map_release()
implementation again. map_create() does set usercnt to 1, and
bpf_map_release(), which I assume is called when map FD is closed,
does decrement usercnt, so yeah, we do with_uref() manipulations for
cases when userspace maintains some sort of handle to map. In that
regard, mmap() does fall into the same category, so I'm going to
convert everything mmap-related back to
bpf_map_inc_with_uref()/bpf_map_put_with_uref(), to stay consistent.

>
> The fact that some maps make use of it and some others not is an implementation
> detail in my opinion, but the concept itself is generic and can be used by
> whatever map implementation would need it in future. From my perspective not
> breaking with this semantic would allow to worry about one less issue once
> this callback gets reused for whatever reason.
>
> As I understand, from your PoV, you think that this uref counter is and will
> be exactly only tied to the fd based maps that currently use it and given
> they understandably won't ever need a mmap interface we don't need to inc/dec
> it there.
>
> Fair enough, but could we add some assertion then which adds a check if a map
> ever uses both that we bail out so we don't forget about this detail in a few
> weeks from now? Given complexity we have in our BPF codebase these days, I'm
> mainly worried about the latter if we can catch such details with a trivial
> check easily, for example, it would be trivial enough to add a test for the
> existence of map_release_uref callback inside bpf_map_mmap() and bail out in
> order to guarantee this, similar as you do with the spinlock.
>
> > I'm going to convert bpf_prog_add/bpf_prog_inc, though, and will do it
> > as a separate patch, on top of bpf_map_inc refactor. It touches quite
> > a lot drivers, so would benefit from having being separate.
>
> Yeah, sounds good to me. Thanks for converting!
>
> >> I'm struggling to understand the concern.
> >> map-in-map, xskmap, socket local storage are doing bpf_map_inc(, false)
> >> when they need to hold the map. Why this case is any different?
>
> (See above.)



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux