On Wed, Feb 7, 2024 at 5:38 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Wed, Feb 7, 2024 at 5:15 PM Andrii Nakryiko > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > On Tue, Feb 6, 2024 at 2:05 PM Alexei Starovoitov > > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > > > From: Alexei Starovoitov <ast@xxxxxxxxxx> > > > > > > mmap() bpf_arena right after creation, since the kernel needs to > > > remember the address returned from mmap. This is user_vm_start. > > > LLVM will generate bpf_arena_cast_user() instructions where > > > necessary and JIT will add upper 32-bit of user_vm_start > > > to such pointers. > > > > > > Use traditional map->value_size * map->max_entries to calculate mmap sz, > > > though it's not the best fit. > > > > We should probably make bpf_map_mmap_sz() aware of specific map type > > and do different calculations based on that. It makes sense to have > > round_up(PAGE_SIZE) for BPF map arena, and use just just value_size or > > max_entries to specify the size (fixing the other to be zero). > > I went with value_size == key_size == 8 in order to be able to extend > it in the future and allow map_lookup/update/delete to do something > useful. Ex: lookup/delete can behave just like arena_alloc/free_pages. > > Are you proposing to force key/value_size to zero ? Yeah, I was thinking either (value_size=<size-in-bytes> and max_entries=0) or (value_size=0 and max_entries=<size-in-bytes>). The latter is what we do for BPF ringbuf, for example. What you are saying about lookup/update already seems different from any "normal" map anyways, so I'm not sure that's a good enough reason to have hard-coded 8 for value size. And it seems like in practice instead of doing lookup/update through syscall, the more natural way of working with arena is going to be mmap() anyways, so not even sure we need to implement the syscall side of lookup/update. But just as an extra aside, what you have in mind for lookup/update for the arena map can be generalized into "partial lookup/update" for any map where it makes sense. I.e., instead of expecting the user to read/update the entire value size, we can allow them to provide a subrange to read/update (i.e., offset+size combo to specify subrange within full map value range). This will work for the arena, but also for most other maps (if not all) that currently support LOOKUP/UPDATE. but specifically for bpf_map_mmap_sz(), regardless of what we decide we should still change it to be something like: switch (map_type) { case BPF_MAP_TYPE_ARRAY: return <whatever we are doing right now>; case BPF_MAP_TYPE_ARENA: /* round up to page size */ return round_up(<whatever based on value_size and/or max_entries>, page_size); default: return 0; /* not supported */ } we can also add a still different case for RINGBUF, where it's (2 * max_entries). The general point is that mmapable size rules differ by map type, so we best express this explicitly in this helper. > That was my first attempt. > key_size can be zero, but syscall side of lookup/update expects > a non-zero value_size for all maps regardless of type. > We can modify bpf/syscall.c, of course, but it feels arena would be > too different of a map if generic map handling code would need > to be specialized. > > Then since value_size is > 0 then what sizes make sense? > When it's 8 it can be an indirection to anything. > key/value would be user pointers to other structs that > would be meaningful for an arena. > Right now it costs nothing to force both to 8 and pick any logic > when we decide what lookup/update should do. > > But then when value_size == 8 than making max_entries to > mean the size of arena in bytes or pages.. starting to look odd > and different from all other maps. > > We could go with max_entries==0 and value_size to mean the size of > arena in bytes, but it will prevent us from defining lookup/update > in the future, which doesn't feel right. > > Considering all this I went with map->value_size * map->max_entries choice. > Though it's not pretty. > > > > @@ -4908,6 +4910,22 @@ static int bpf_object__create_map(struct bpf_object *obj, struct bpf_map *map, b > > > if (map->fd == map_fd) > > > return 0; > > > > > > + if (def->type == BPF_MAP_TYPE_ARENA) { > > > + size_t mmap_sz; > > > + > > > + mmap_sz = bpf_map_mmap_sz(def->value_size, def->max_entries); > > > + map->mmaped = mmap((void *)map->map_extra, mmap_sz, PROT_READ | PROT_WRITE, > > > + map->map_extra ? MAP_SHARED | MAP_FIXED : MAP_SHARED, > > > + map_fd, 0); > > > + if (map->mmaped == MAP_FAILED) { > > > + err = -errno; > > > + map->mmaped = NULL; > > > + pr_warn("map '%s': failed to mmap bpf_arena: %d\n", > > > + bpf_map__name(map), err); > > > + return err; > > > > leaking map_fd here, you need to close(map_fd) before erroring out > > ahh. good catch.