Re: [PATCH bpf-next v4 08/11] libbpf: Update gen_loader to emit BTF_KIND_FUNC relocations

Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> · Wed, 22 Sep 2021 03:55:21 +0530

On Wed, Sep 22, 2021 at 12:34:41AM IST, Alexei Starovoitov wrote:
> On Mon, Sep 20, 2021 at 9:50 PM Kumar Kartikeya Dwivedi
> <memxor@xxxxxxxxx> wrote:
> >
> > On Tue, Sep 21, 2021 at 06:27:16AM IST, Alexei Starovoitov wrote:
> > > On Mon, Sep 20, 2021 at 7:15 AM Kumar Kartikeya Dwivedi
> > > <memxor@xxxxxxxxx> wrote:
> > > >
> > > > This change updates the BPF syscall loader to relocate BTF_KIND_FUNC
> > > > relocations, with support for weak kfunc relocations. The next commit
> > > > adds bpftool supports to set up the fd_array_sz parameter for light
> > > > skeleton.
> > > >
> > > > A second map for keeping fds is used instead of adding fds to existing
> > > > loader.map because of following reasons:
> > >
> > > but it complicates signing bpf progs a lot.
> > >
> >
> > Can you explain this in short? (Just want to understand why it would be
> > problem).
>
> The signing idea (and light skeleton too) rely on two matching blocks:
> signed map and signed prog that operates on this map.
> They have to match and be technically part of single logical signature
> that consists of two pieces.
> The second map doesn't quite fit this model. Especially since it's an empty
> map and it is there for temporary use during execution of the loader prog.
> That fd_array_sz value would somehow need to be part of the signature.
> Adding a 3rd non-generic component to a signature has consequences
> to the whole signing process.
> The loader prog could have created this temp map on its own
> without asking bpf_load_and_run() to do it and without exposing it
> into a signature.
> Anyway the signed bpf progs may get solved differently with the latest John
> proposal, but that's a different discussion.
> The light skeleton minimalizm is its main advantage. Keeping it two
> pieces: one map and one prog is its main selling point.
>
> > > > If reserving an area for map and BTF fds, we would waste the remaining
> > > > of (MAX_USED_MAPS + MAX_KFUNC_DESCS) * sizeof(int), which in most cases
> > > > will be unused by the program. Also, we must place some limit on the
> > > > amount of map and BTF fds a program can possibly open.
> > >
> > > That is just (256 + 64)*4 bytes of data. Really not much.
> > > I wouldn't worry about reserving this space.
> > >
> >
> > Ok, I'll probably go with this now, I didn't realise a separate fd would be
> > prohibitive for the signing case, so I thought it would nice to lift the
> > limiation on number of map_fds by packing fd_array fds in another map.
> >
> > > > If setting gen->fd_array to first map_fd offset, and then just finding
> > > > the offset relative to this (for later BTF fds), such that they can be
> > > > packed without wasting space, we run the risk of unnecessarily running
> > > > out of valid offset for emit_relo stage (for kfuncs), because gen map
> > > > creation and relocation stages are separated by other steps that can add
> > > > lots of data (including bpf_object__populate_internal_map). It is also
> > > > prone to break silently if features are added between map and BTF fd
> > > > emits that possibly add more data (just ~128KB to break BTF fd, since
> > > > insn->off allows for INT16_MAX (32767) * 4 bytes).
> > >
> > > I don't follow this logic.
> > >
> > > > Both of these issues are compounded by the fact that data map is shared
> > > > by all programs, so it is easy to end up with invalid offset for BTF fd.
> > >
> > > I don't follow this either. There is only one map and one program.
> > > What sharing are you talking about?
> >
> > What I saw was that the sequence of calls is like this:
> > bpf_gen__map_create
> > add_data - from first emit we add map_fd, we also store gen->fd_array
> > then libbpf would call bpf_object__populate_internal_map
> > which calls bpf_gen__map_update_elem, which also does add_data (can be of
> > arbitrary sizes).
> >
> > emit_relos happens relatively at the end.
> > For each program in the object, this sequence can be repeated, such that the
> > add_data that we do in emit_relos, relative offset from gen->fd_array offset
> > can end up becoming big enough (as all programs in object add data to same map),
> > while gen->fd_array comes from first map creation.
>
> You've meant to use fd_array as a very very sparse array
> with giant gaps between valid map_fds and btf_fds. Now I see it :)
> Indeed in such a case there is a risk of running out of 16-bit in bpf_insn->off.
> Reserving (256 + 64)*4 in the beginning of the data map should solve it, right?
> The loader prog can create a 2nd auxiliary map on the fly,
> but it seems easier and simpler to just reserve this space in one and only map.

Thanks for the explanation! It makes sense. I will fix this in the next spin.

--
Kartikeya