Re: [RFC PATCH bpf-next 0/6] bpf: Handle reuse in bpf memory alloc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 1/3/23 10:30 PM, Hou Tao wrote:
Hi,

On 1/4/2023 2:10 PM, Yonghong Song wrote:


On 1/3/23 5:47 AM, Hou Tao wrote:
Hi,

On 1/2/2023 2:48 AM, Yonghong Song wrote:


On 12/31/22 5:26 PM, Alexei Starovoitov wrote:
On Fri, Dec 30, 2022 at 12:11:45PM +0800, Hou Tao wrote:
From: Hou Tao <houtao1@xxxxxxxxxx>

Hi,

The patchset tries to fix the problems found when checking how htab map
handles element reuse in bpf memory allocator. The immediate reuse of
freed elements may lead to two problems in htab map:

(1) reuse will reinitialize special fields (e.g., bpf_spin_lock) in
       htab map value and it may corrupt lookup procedure with BFP_F_LOCK
       flag which acquires bpf-spin-lock during value copying. The
       corruption of bpf-spin-lock may result in hard lock-up.
(2) lookup procedure may get incorrect map value if the found element is
       freed and then reused.

Because the type of htab map elements are the same, so problem #1 can be
fixed by supporting ctor in bpf memory allocator. The ctor initializes
these special fields in map element only when the map element is newly
allocated. If it is just a reused element, there will be no
reinitialization.

Instead of adding the overhead of ctor callback let's just
add __GFP_ZERO to flags in __alloc().
That will address the issue 1 and will make bpf_mem_alloc behave just
like percpu_freelist, so hashmap with BPF_F_NO_PREALLOC and default
will behave the same way.

Patch https://lore.kernel.org/all/20220809213033.24147-3-memxor@xxxxxxxxx/
tried to address a similar issue for lru hash table.
Maybe we need to do similar things after bpf_mem_cache_alloc() for
hash table?
IMO ctor or __GFP_ZERO will fix the issue. Did I miss something here ?

The following is my understanding:
in function alloc_htab_elem() (hashtab.c), we have

                 if (is_map_full(htab))
                         if (!old_elem)
                                 /* when map is full and update() is replacing
                                  * old element, it's ok to allocate, since
                                  * old element will be freed immediately.
                                  * Otherwise return an error
                                  */
                                 return ERR_PTR(-E2BIG);
                 inc_elem_count(htab);
                 l_new = bpf_mem_cache_alloc(&htab->ma);
                 if (!l_new) {
                         l_new = ERR_PTR(-ENOMEM);
                         goto dec_count;
                 }
                 check_and_init_map_value(&htab->map,
                                          l_new->key + round_up(key_size, 8));

In the above check_and_init_map_value() intends to do initializing
for an element from bpf_mem_cache_alloc (could be reused from the free list).

The check_and_init_map_value() looks like below (in include/linux/bpf.h)

static inline void bpf_obj_init(const struct btf_field_offs *foffs, void *obj)
{
         int i;

         if (!foffs)
                 return;
         for (i = 0; i < foffs->cnt; i++)
                 memset(obj + foffs->field_off[i], 0, foffs->field_sz[i]);
}

static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
{
         bpf_obj_init(map->field_offs, dst);
}

IIUC, bpf_obj_init() will bzero those fields like spin_lock, timer,
list_head, list_node, etc.

This is the problem for above problem #1.
Maybe I missed something?
Yes. It is the problem patch #1 tries to fix exactly. Patch #1 tries to fix the
problem by only calling check_and_init_map_value() once for the newly-allocated
element, so if a freed element is reused, its special fields will not be zeroed
again. Is there any other cases which are not covered by the solution or any
other similar problems in hash-tab ?

No, I checked all cases of check_and_init_map_value() and didn't find
any other instances.





Problem #2 exists for both non-preallocated and preallocated htab map.
By adding seq in htab element, doing reuse check and retrying the
lookup procedure may be a feasible solution, but it will make the
lookup API being hard to use, because the user needs to check whether
the found element is reused or not and repeat the lookup procedure if it
is reused. A simpler solution would be just disabling freed elements
reuse and freeing these elements after lookup procedure ends.

You've proposed this 'solution' twice already in qptrie thread and both
times the answer was 'no, we cannot do this' with reasons explained.
The 3rd time the answer is still the same.
This 'issue 2' existed in hashmap since very beginning for many years.
It's a known quirk. There is nothing to fix really.

The graph apis (aka new gen data structs) with link list and rbtree are
in active development. Soon bpf progs will be able to implement their own
hash maps with explicit bpf_rcu_read_lock. At that time the progs will
be making the trade off between performance and lookup/delete race.
So please respin with just __GFP_ZERO and update the patch 6
to check for lockup only.





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux