On Thu, Nov 7, 2019 at 10:39 PM Song Liu <songliubraving@xxxxxx> wrote: > > > > > On Nov 7, 2019, at 8:20 PM, Andrii Nakryiko <andriin@xxxxxx> wrote: > > > > Add ability to memory-map contents of BPF array map. This is extremely useful > > for working with BPF global data from userspace programs. It allows to avoid > > typical bpf_map_{lookup,update}_elem operations, improving both performance > > and usability. > > > > There had to be special considerations for map freezing, to avoid having > > writable memory view into a frozen map. To solve this issue, map freezing and > > mmap-ing is happening under mutex now: > > - if map is already frozen, no writable mapping is allowed; > > - if map has writable memory mappings active (accounted in map->writecnt), > > map freezing will keep failing with -EBUSY; > > - once number of writable memory mappings drops to zero, map freezing can be > > performed again. > > > > Only non-per-CPU arrays are supported right now. Maps with spinlocks can't be > > memory mapped either. > > > > Cc: Rik van Riel <riel@xxxxxxxxxxx> > > Cc: Johannes Weiner <hannes@xxxxxxxxxxx> > > Signed-off-by: Andrii Nakryiko <andriin@xxxxxx> > > Acked-by: Song Liu <songliubraving@xxxxxx> > > With one nit below. > > > [...] > > > - if (percpu) > > + data_size = 0; > > + if (percpu) { > > array_size += (u64) max_entries * sizeof(void *); > > - else > > - array_size += (u64) max_entries * elem_size; > > > + } else { > > + if (attr->map_flags & BPF_F_MMAPABLE) { > > + data_size = (u64) max_entries * elem_size; > > + data_size = round_up(data_size, PAGE_SIZE); > > + } else { > > + array_size += (u64) max_entries * elem_size; > > + } > > + } > > > > /* make sure there is no u32 overflow later in round_up() */ > > - cost = array_size; > > + cost = array_size + data_size; > > > > This is a little confusing. Maybe we can do > I don't think I can do that without even bigger code churn. In non-mmap()-able case, array_size specifies the size of one chunk of memory, which consists of sizeof(struct bpf_array) bytes, followed by actual data. This is accomplished in one allocation. That's current case for arrays. For BPF_F_MMAPABLE case, though, we have to do 2 separate allocations, to make sure that mmap()-able part is allocated with vmalloc() and is page-aligned. So array_size keeps track of number of bytes allocated for struct bpf_array plus, optionally, per-cpu or non-mmapable array data, while data_size is explicitly for vmalloc()-ed mmap()-able chunk of data. If not for this, I'd just keep adjusting array_size. So the invariant for per-cpu and non-mmapable case is that data_size = 0, array_size = sizeof(struct bpf_array) + whatever amount of data we need. For mmapable case: array_size = sizeof(struct bpf_array), data_size = actual amount of array data. > data_size = (u64) max_entries * (per_cpu ? sizeof(void *) : elem_size; > if (attr->map_flags & BPF_F_MMAPABLE) > data_size = round_up(data_size, PAGE_SIZE); > > cost = array_size + data_size; > > So we use data_size in all cases. > > Maybe also rename array_size. > > > > if (percpu) > > cost += (u64)attr->max_entries * elem_size * num_possible_cpus(); > > And maybe we can also include this in data_size. see above. > > [...] >