[RFC PATCH bpf-next 0/6] bpf: Handle reuse in bpf memory alloc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Hou Tao <houtao1@xxxxxxxxxx>

Hi,

The patchset tries to fix the problems found when checking how htab map
handles element reuse in bpf memory allocator. The immediate reuse of
freed elements may lead to two problems in htab map:

(1) reuse will reinitialize special fields (e.g., bpf_spin_lock) in
    htab map value and it may corrupt lookup procedure with BFP_F_LOCK
    flag which acquires bpf-spin-lock during value copying. The
    corruption of bpf-spin-lock may result in hard lock-up.
(2) lookup procedure may get incorrect map value if the found element is
    freed and then reused.

Because the type of htab map elements are the same, so problem #1 can be
fixed by supporting ctor in bpf memory allocator. The ctor initializes
these special fields in map element only when the map element is newly
allocated. If it is just a reused element, there will be no
reinitialization.

Problem #2 exists for both non-preallocated and preallocated htab map.
By adding seq in htab element, doing reuse check and retrying the
lookup procedure may be a feasible solution, but it will make the
lookup API being hard to use, because the user needs to check whether
the found element is reused or not and repeat the lookup procedure if it
is reused. A simpler solution would be just disabling freed elements
reuse and freeing these elements after lookup procedure ends.

In order to reduce the overhead of call_rcu_tasks_trace() for each freed
elements, freeing these elements in batch by moving these freed elements
into a global per-cpu free list firstly, then after the number of freed
elements reaches the threshold, these freed elements will be moved into
a dymaically allocated object and being freed by a global per-cpu worker
by calling call_rcu_tasks_trace().

Because the solution frees memory by allocating new memory, so if there
is no memory available, the global per-cpu worker will call
rcu_barrier_tasks_trace() to wait for the expiration of RCU grace period
and free these free elements which have been spliced into a temporary
list. And the newly freed elements will be freed after another round of
rcu_barrier_tasks_trace() if there is still no memory. Maybe need to
reserve some bpf_ma_free_batch to speed up the free. Now also doesn't
consider the scenario when RCU grace period is slow. Because these
newly-allocated memory (aka bpf_ma_free_batch) will be freed after the
expiration of RCU grace period, so if grace period is slow, there may be
too much bpf_ma_free_batch being allocated.

Aftering applying BPF_MA_NO_REUSE in htab map, the performance of
"./map_perf_test 4 18 8192" drops from 520K to 330K events per sec on
one CPU. It is a big performance degradation, so hope to get some
feedbacks on whether or not it is necessary and how to better fixing the
reuse problem in htab map (global allocated object may have the same
problems as htab map). Comments are always welcome.

Regards,
Hou

Hou Tao (6):
  bpf: Support ctor in bpf memory allocator
  bpf: Factor out a common helper free_llist()
  bpf: Pass bitwise flags to bpf_mem_alloc_init()
  bpf: Introduce BPF_MA_NO_REUSE for bpf memory allocator
  bpf: Use BPF_MA_NO_REUSE in htab map
  selftests/bpf: Add test case for element reuse in htab map

 include/linux/bpf_mem_alloc.h                 |  12 +-
 kernel/bpf/core.c                             |   2 +-
 kernel/bpf/hashtab.c                          |  17 +-
 kernel/bpf/memalloc.c                         | 218 ++++++++++++++++--
 .../selftests/bpf/prog_tests/htab_reuse.c     | 111 +++++++++
 .../testing/selftests/bpf/progs/htab_reuse.c  |  19 ++
 6 files changed, 353 insertions(+), 26 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/htab_reuse.c
 create mode 100644 tools/testing/selftests/bpf/progs/htab_reuse.c

-- 
2.29.2




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux