On Wed, Nov 13, 2019 at 12:06 PM John Fastabend <john.fastabend@xxxxxxxxx> wrote: > > Andrii Nakryiko wrote: > > Add ability to memory-map contents of BPF array map. This is extremely useful > > for working with BPF global data from userspace programs. It allows to avoid > > typical bpf_map_{lookup,update}_elem operations, improving both performance > > and usability. > > > > There had to be special considerations for map freezing, to avoid having > > writable memory view into a frozen map. To solve this issue, map freezing and > > mmap-ing is happening under mutex now: > > - if map is already frozen, no writable mapping is allowed; > > - if map has writable memory mappings active (accounted in map->writecnt), > > map freezing will keep failing with -EBUSY; > > - once number of writable memory mappings drops to zero, map freezing can be > > performed again. > > > > Only non-per-CPU plain arrays are supported right now. Maps with spinlocks > > can't be memory mapped either. > > > > For BPF_F_MMAPABLE array, memory allocation has to be done through vmalloc() > > to be mmap()'able. We also need to make sure that array data memory is > > page-sized and page-aligned, so we over-allocate memory in such a way that > > struct bpf_array is at the end of a single page of memory with array->value > > being aligned with the start of the second page. On deallocation we need to > > accomodate this memory arrangement to free vmalloc()'ed memory correctly. > > > > Cc: Rik van Riel <riel@xxxxxxxxxxx> > > Cc: Johannes Weiner <hannes@xxxxxxxxxxx> > > Acked-by: Song Liu <songliubraving@xxxxxx> > > Signed-off-by: Andrii Nakryiko <andriin@xxxxxx> > > [...] > > > @@ -102,10 +106,20 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr) > > } > > > > array_size = sizeof(*array); > > - if (percpu) > > + if (percpu) { > > array_size += (u64) max_entries * sizeof(void *); > > - else > > - array_size += (u64) max_entries * elem_size; > > + } else { > > + /* rely on vmalloc() to return page-aligned memory and > > + * ensure array->value is exactly page-aligned > > + */ > > + if (attr->map_flags & BPF_F_MMAPABLE) { > > + array_size = round_up(array_size, PAGE_SIZE); > > + array_size += (u64) max_entries * elem_size; > > + array_size = round_up(array_size, PAGE_SIZE); > > + } else { > > + array_size += (u64) max_entries * elem_size; > > + } > > + } > > Thought about this chunk for a bit, assuming we don't end up with lots of > small mmap arrays it should be OK. So userspace will probably need to try and > optimize this to create as few mmaps as possible. I think typically most explicitly declared maps won't be BPF_F_MMAPABLE, unless user really expects to mmap() it for use from user-space. For global data, though, the benefits are really great from being able to mmap(), which is why I'm defaulting them to BPF_F_MMAPABLE by default, if possible. > > [...] > > Acked-by: John Fastabend <john.fastabend@xxxxxxxxx>