Re: [PATCH v1 bpf-next 1/2] bpf: Support BPF_F_MMAPABLE task_local storage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 11/20/23 12:59 PM, Dave Marchevsky wrote:
This patch modifies the generic bpf_local_storage infrastructure to
support mmapable map values and adds mmap() handling to task_local
storage leveraging this new functionality. A userspace task which
mmap's a task_local storage map will receive a pointer to the map_value
corresponding to that tasks' key - mmap'ing in other tasks' mapvals is
not supported in this patch.

Currently, struct bpf_local_storage_elem contains both bookkeeping
information as well as a struct bpf_local_storage_data with additional
bookkeeping information and the actual mapval data. We can't simply map
the page containing this struct into userspace. Instead, mmapable
local_storage uses bpf_local_storage_data's data field to point to the
actual mapval, which is allocated separately such that it can be
mmapped. Only the mapval lives on the page(s) allocated for it.

The lifetime of the actual_data mmapable region is tied to the
bpf_local_storage_elem which points to it. This doesn't necessarily mean
that the pages go away when the bpf_local_storage_elem is free'd - if
they're mapped into some userspace process they will remain until
unmapped, but are no longer the task_local storage's mapval.

Implementation details:

   * A few small helpers are added to deal with bpf_local_storage_data's
     'data' field having different semantics when the local_storage map
     is mmapable. With their help, many of the changes to existing code
     are purely mechanical (e.g. sdata->data becomes sdata_mapval(sdata),
     selem->elem_size becomes selem_bytes_used(selem)).

   * The map flags are copied into bpf_local_storage_data when its
     containing bpf_local_storage_elem is alloc'd, since the
     bpf_local_storage_map associated with them may be gone when
     bpf_local_storage_data is free'd, and testing flags for
     BPF_F_MMAPABLE is necessary when free'ing to ensure that the
     mmapable region is free'd.
     * The extra field doesn't change bpf_local_storage_elem's size.
       There were 48 bytes of padding after the bpf_local_storage_data
       field, now there are 40.

   * Currently, bpf_local_storage_update always creates a new
     bpf_local_storage_elem for the 'updated' value - the only exception
     being if the map_value has a bpf_spin_lock field, in which case the
     spin lock is grabbed instead of the less granular bpf_local_storage
     lock, and the value updated in place. This inplace update behavior
     is desired for mmapable local_storage map_values as well, since
     creating a new selem would result in new mmapable pages.

   * The size of the mmapable pages are accounted for when calling
     mem_{charge,uncharge}. If the pages are mmap'd into a userspace task
     mem_uncharge may be called before they actually go away.

Signed-off-by: Dave Marchevsky <davemarchevsky@xxxxxx>
---
  include/linux/bpf_local_storage.h |  14 ++-
  kernel/bpf/bpf_local_storage.c    | 145 ++++++++++++++++++++++++------
  kernel/bpf/bpf_task_storage.c     |  35 ++++++--
  kernel/bpf/syscall.c              |   2 +-
  4 files changed, 163 insertions(+), 33 deletions(-)

diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h
index 173ec7f43ed1..114973f925ea 100644
--- a/include/linux/bpf_local_storage.h
+++ b/include/linux/bpf_local_storage.h
@@ -69,7 +69,17 @@ struct bpf_local_storage_data {
  	 * the number of cachelines accessed during the cache hit case.
  	 */
  	struct bpf_local_storage_map __rcu *smap;
-	u8 data[] __aligned(8);
+	/* Need to duplicate smap's map_flags as smap may be gone when
+	 * it's time to free bpf_local_storage_data
+	 */
+	u64 smap_map_flags;
+	/* If BPF_F_MMAPABLE, this is a void * to separately-alloc'd data
+	 * Otherwise the actual mapval data lives here
+	 */
+	union {
+		DECLARE_FLEX_ARRAY(u8, data) __aligned(8);
+		void *actual_data __aligned(8);

I think we can remove __aligned(8) from 'void *actual_data __aligned(8)'
in the above. There are two reasons, first, the first element in the union
is aligned(8) and then the rest of union members will be at least aligned 8.
second, IIUC, in the code, we have
    actual_data = mmapable_value
where mmapable_value has type 'void *'. So actual_data is used
to hold a pointer to mmap-ed page, so it only needs pointer type
alignment which can already be achieved with 'void *'.
So we can remove __aligned(8) safely. It can also remove
an impression that 'actual_data' has to be 8-byte aligned in
32-bit system although it does not need to be just by 'actual_data'
itself.


+	};
  };
/* Linked to bpf_local_storage and bpf_local_storage_map */
@@ -124,6 +134,8 @@ static struct bpf_local_storage_cache name = {			\
  /* Helper functions for bpf_local_storage */
  int bpf_local_storage_map_alloc_check(union bpf_attr *attr);
+void *sdata_mapval(struct bpf_local_storage_data *data);
+
  struct bpf_map *
  bpf_local_storage_map_alloc(union bpf_attr *attr,
  			    struct bpf_local_storage_cache *cache,

[...]





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux