On 11/15/19 3:31 PM, Daniel Borkmann wrote: > On 11/15/19 5:02 AM, Andrii Nakryiko wrote: >> Add ability to memory-map contents of BPF array map. This is extremely >> useful >> for working with BPF global data from userspace programs. It allows to >> avoid >> typical bpf_map_{lookup,update}_elem operations, improving both >> performance >> and usability. >> >> There had to be special considerations for map freezing, to avoid having >> writable memory view into a frozen map. To solve this issue, map >> freezing and >> mmap-ing is happening under mutex now: >> - if map is already frozen, no writable mapping is allowed; >> - if map has writable memory mappings active (accounted in >> map->writecnt), >> map freezing will keep failing with -EBUSY; >> - once number of writable memory mappings drops to zero, map >> freezing can be >> performed again. >> >> Only non-per-CPU plain arrays are supported right now. Maps with >> spinlocks >> can't be memory mapped either. >> >> For BPF_F_MMAPABLE array, memory allocation has to be done through >> vmalloc() >> to be mmap()'able. We also need to make sure that array data memory is >> page-sized and page-aligned, so we over-allocate memory in such a way >> that >> struct bpf_array is at the end of a single page of memory with >> array->value >> being aligned with the start of the second page. On deallocation we >> need to >> accomodate this memory arrangement to free vmalloc()'ed memory correctly. >> >> One important consideration regarding how memory-mapping subsystem >> functions. >> Memory-mapping subsystem provides few optional callbacks, among them >> open() >> and close(). close() is called for each memory region that is >> unmapped, so >> that users can decrease their reference counters and free up >> resources, if >> necessary. open() is *almost* symmetrical: it's called for each memory >> region >> that is being mapped, **except** the very first one. So bpf_map_mmap does >> initial refcnt bump, while open() will do any extra ones after that. Thus >> number of close() calls is equal to number of open() calls plus one more. >> >> Cc: Johannes Weiner <hannes@xxxxxxxxxxx> >> Cc: Rik van Riel <riel@xxxxxxxxxxx> >> Acked-by: Song Liu <songliubraving@xxxxxx> >> Acked-by: John Fastabend <john.fastabend@xxxxxxxxx> >> Signed-off-by: Andrii Nakryiko <andriin@xxxxxx> > > [...] >> +/* called for any extra memory-mapped regions (except initial) */ >> +static void bpf_map_mmap_open(struct vm_area_struct *vma) >> +{ >> + struct bpf_map *map = vma->vm_file->private_data; >> + >> + bpf_map_inc(map); > > This would also need to inc uref counter since it's technically a reference > of this map into user space as otherwise if map->ops->map_release_uref > would > be used for maps supporting mmap, then the callback would trigger even > if user > space still has a reference to it. I thought we use uref only for array that can hold FDs ? That's why I suggested Andrii earlier to drop uref++.