On 3/24/22 4:41 PM, Hao Luo wrote:
Some map types support mmap operation, which allows userspace to communicate with BPF programs directly. Currently only arraymap and ringbuf have mmap implemented. However, in some use cases, when multiple program instances can run concurrently, global mmapable memory can cause race. In that case, userspace needs to provide necessary synchronizations to coordinate the usage of mapped global data. This can be a source of bottleneck.
I can see your use case here. Each calling process can get the corresponding bpf program task local storage data through mmap interface. As you mentioned, there is a tradeoff between more memory vs. non-global synchronization. I am thinking that another bpf_iter approach can retrieve the similar result. We could implement a bpf_iter for task local storage map, optionally it can provide a tid to retrieve the data for that particular tid. This way, user space needs an explicit syscall, but does not need to allocate more memory than necessary. WDYT?
It would be great to have a mmapable local storage in that case. This patch adds that. Mmap isn't BPF syscall, so unpriv users can also use it to interact with maps. Currently the only way of allocating mmapable map area is using vmalloc() and it's only used at map allocation time. Vmalloc() may sleep, therefore it's not suitable for maps that may allocate memory in an atomic context such as local storage. Local storage uses kmalloc() with GFP_ATOMIC, which doesn't sleep. This patch uses kmalloc() with GFP_ATOMIC as well for mmapable map area. Allocating mmapable memory has requirment on page alignment. So we have to deliberately allocate more memory than necessary to obtain an address that has sdata->data aligned at page boundary. The calculations for mmapable allocation size, and the actual allocation/deallocation are packaged in three functions: - bpf_map_mmapable_alloc_size() - bpf_map_mmapable_kzalloc() - bpf_map_mmapable_kfree() BPF local storage uses them to provide generic mmap API: - bpf_local_storage_mmap() And task local storage adds the mmap callback: - task_storage_map_mmap() When application calls mmap on a task local storage, it gets its own local storage. Overall, mmapable local storage trades off memory with flexibility and efficiency. It brings memory fragmentation but can make programs stateless. Therefore useful in some cases. Hao Luo (2): bpf: Mmapable local storage. selftests/bpf: Test mmapable task local storage. include/linux/bpf.h | 4 + include/linux/bpf_local_storage.h | 5 +- kernel/bpf/bpf_local_storage.c | 73 +++++++++++++++++-- kernel/bpf/bpf_task_storage.c | 40 ++++++++++ kernel/bpf/syscall.c | 67 +++++++++++++++++ .../bpf/prog_tests/task_local_storage.c | 38 ++++++++++ .../bpf/progs/task_local_storage_mmapable.c | 38 ++++++++++ 7 files changed, 257 insertions(+), 8 deletions(-) create mode 100644 tools/testing/selftests/bpf/progs/task_local_storage_mmapable.c