On Wed, Sep 22, 2021 at 12:34:41AM IST, Alexei Starovoitov wrote: > On Mon, Sep 20, 2021 at 9:50 PM Kumar Kartikeya Dwivedi > <memxor@xxxxxxxxx> wrote: > > > > On Tue, Sep 21, 2021 at 06:27:16AM IST, Alexei Starovoitov wrote: > > > On Mon, Sep 20, 2021 at 7:15 AM Kumar Kartikeya Dwivedi > > > <memxor@xxxxxxxxx> wrote: > > > > > > > > This change updates the BPF syscall loader to relocate BTF_KIND_FUNC > > > > relocations, with support for weak kfunc relocations. The next commit > > > > adds bpftool supports to set up the fd_array_sz parameter for light > > > > skeleton. > > > > > > > > A second map for keeping fds is used instead of adding fds to existing > > > > loader.map because of following reasons: > > > > > > but it complicates signing bpf progs a lot. > > > > > > > Can you explain this in short? (Just want to understand why it would be > > problem). > > The signing idea (and light skeleton too) rely on two matching blocks: > signed map and signed prog that operates on this map. > They have to match and be technically part of single logical signature > that consists of two pieces. > The second map doesn't quite fit this model. Especially since it's an empty > map and it is there for temporary use during execution of the loader prog. > That fd_array_sz value would somehow need to be part of the signature. > Adding a 3rd non-generic component to a signature has consequences > to the whole signing process. > The loader prog could have created this temp map on its own > without asking bpf_load_and_run() to do it and without exposing it > into a signature. > Anyway the signed bpf progs may get solved differently with the latest John > proposal, but that's a different discussion. > The light skeleton minimalizm is its main advantage. Keeping it two > pieces: one map and one prog is its main selling point. > > > > > If reserving an area for map and BTF fds, we would waste the remaining > > > > of (MAX_USED_MAPS + MAX_KFUNC_DESCS) * sizeof(int), which in most cases > > > > will be unused by the program. Also, we must place some limit on the > > > > amount of map and BTF fds a program can possibly open. > > > > > > That is just (256 + 64)*4 bytes of data. Really not much. > > > I wouldn't worry about reserving this space. > > > > > > > Ok, I'll probably go with this now, I didn't realise a separate fd would be > > prohibitive for the signing case, so I thought it would nice to lift the > > limiation on number of map_fds by packing fd_array fds in another map. > > > > > > If setting gen->fd_array to first map_fd offset, and then just finding > > > > the offset relative to this (for later BTF fds), such that they can be > > > > packed without wasting space, we run the risk of unnecessarily running > > > > out of valid offset for emit_relo stage (for kfuncs), because gen map > > > > creation and relocation stages are separated by other steps that can add > > > > lots of data (including bpf_object__populate_internal_map). It is also > > > > prone to break silently if features are added between map and BTF fd > > > > emits that possibly add more data (just ~128KB to break BTF fd, since > > > > insn->off allows for INT16_MAX (32767) * 4 bytes). > > > > > > I don't follow this logic. > > > > > > > Both of these issues are compounded by the fact that data map is shared > > > > by all programs, so it is easy to end up with invalid offset for BTF fd. > > > > > > I don't follow this either. There is only one map and one program. > > > What sharing are you talking about? > > > > What I saw was that the sequence of calls is like this: > > bpf_gen__map_create > > add_data - from first emit we add map_fd, we also store gen->fd_array > > then libbpf would call bpf_object__populate_internal_map > > which calls bpf_gen__map_update_elem, which also does add_data (can be of > > arbitrary sizes). > > > > emit_relos happens relatively at the end. > > For each program in the object, this sequence can be repeated, such that the > > add_data that we do in emit_relos, relative offset from gen->fd_array offset > > can end up becoming big enough (as all programs in object add data to same map), > > while gen->fd_array comes from first map creation. > > You've meant to use fd_array as a very very sparse array > with giant gaps between valid map_fds and btf_fds. Now I see it :) > Indeed in such a case there is a risk of running out of 16-bit in bpf_insn->off. > Reserving (256 + 64)*4 in the beginning of the data map should solve it, right? > The loader prog can create a 2nd auxiliary map on the fly, > but it seems easier and simpler to just reserve this space in one and only map. Thanks for the explanation! It makes sense. I will fix this in the next spin. -- Kartikeya