On 11/16/19 12:47 AM, Alexei Starovoitov wrote:
On 11/15/19 3:44 PM, Daniel Borkmann wrote:
On 11/16/19 12:37 AM, Alexei Starovoitov wrote:
On 11/15/19 3:31 PM, Daniel Borkmann wrote:
On 11/15/19 5:02 AM, Andrii Nakryiko wrote:
Add ability to memory-map contents of BPF array map. This is extremely
useful
for working with BPF global data from userspace programs. It allows to
avoid
typical bpf_map_{lookup,update}_elem operations, improving both
performance
and usability.
There had to be special considerations for map freezing, to avoid
having
writable memory view into a frozen map. To solve this issue, map
freezing and
mmap-ing is happening under mutex now:
- if map is already frozen, no writable mapping is allowed;
- if map has writable memory mappings active (accounted in
map->writecnt),
map freezing will keep failing with -EBUSY;
- once number of writable memory mappings drops to zero, map
freezing can be
performed again.
Only non-per-CPU plain arrays are supported right now. Maps with
spinlocks
can't be memory mapped either.
For BPF_F_MMAPABLE array, memory allocation has to be done through
vmalloc()
to be mmap()'able. We also need to make sure that array data memory is
page-sized and page-aligned, so we over-allocate memory in such a way
that
struct bpf_array is at the end of a single page of memory with
array->value
being aligned with the start of the second page. On deallocation we
need to
accomodate this memory arrangement to free vmalloc()'ed memory
correctly.
One important consideration regarding how memory-mapping subsystem
functions.
Memory-mapping subsystem provides few optional callbacks, among them
open()
and close(). close() is called for each memory region that is
unmapped, so
that users can decrease their reference counters and free up
resources, if
necessary. open() is *almost* symmetrical: it's called for each memory
region
that is being mapped, **except** the very first one. So bpf_map_mmap
does
initial refcnt bump, while open() will do any extra ones after that.
Thus
number of close() calls is equal to number of open() calls plus one
more.
Cc: Johannes Weiner <hannes@xxxxxxxxxxx>
Cc: Rik van Riel <riel@xxxxxxxxxxx>
Acked-by: Song Liu <songliubraving@xxxxxx>
Acked-by: John Fastabend <john.fastabend@xxxxxxxxx>
Signed-off-by: Andrii Nakryiko <andriin@xxxxxx>
[...]
+/* called for any extra memory-mapped regions (except initial) */
+static void bpf_map_mmap_open(struct vm_area_struct *vma)
+{
+ struct bpf_map *map = vma->vm_file->private_data;
+
+ bpf_map_inc(map);
This would also need to inc uref counter since it's technically a
reference
of this map into user space as otherwise if map->ops->map_release_uref
would
be used for maps supporting mmap, then the callback would trigger even
if user
space still has a reference to it.
I thought we use uref only for array that can hold FDs ?
That's why I suggested Andrii earlier to drop uref++.
Yeah, only for fd array currently. Question is, if we ever reuse that
map_release_uref
callback in future for something else, will we remember that we earlier
missed to add
it here? :/
What do you mean 'missed to add' ?
Was saying missed to add the inc/put for the uref counter.
This is mmap path. Anything that needs releasing (like FDs for
prog_array or progs for sockmap) cannot be mmap-able.
Right, I meant if in future we ever have another use case outside of it
for some reason (unrelated to those maps you mention above). Can we
guarantee this is never going to happen? Seemed less fragile at least to
maintain proper count here.
Thanks,
Daniel