Re: [RFC PATCH bpf-next 0/6] bpf: Handle reuse in bpf memory alloc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 3, 2023 at 11:14 PM Yonghong Song <yhs@xxxxxxxx> wrote:
>
>
>
> On 1/3/23 10:30 PM, Hou Tao wrote:
> > Hi,
> >
> > On 1/4/2023 2:10 PM, Yonghong Song wrote:
> >>
> >>
> >> On 1/3/23 5:47 AM, Hou Tao wrote:
> >>> Hi,
> >>>
> >>> On 1/2/2023 2:48 AM, Yonghong Song wrote:
> >>>>
> >>>>
> >>>> On 12/31/22 5:26 PM, Alexei Starovoitov wrote:
> >>>>> On Fri, Dec 30, 2022 at 12:11:45PM +0800, Hou Tao wrote:
> >>>>>> From: Hou Tao <houtao1@xxxxxxxxxx>
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> The patchset tries to fix the problems found when checking how htab map
> >>>>>> handles element reuse in bpf memory allocator. The immediate reuse of
> >>>>>> freed elements may lead to two problems in htab map:
> >>>>>>
> >>>>>> (1) reuse will reinitialize special fields (e.g., bpf_spin_lock) in
> >>>>>>        htab map value and it may corrupt lookup procedure with BFP_F_LOCK
> >>>>>>        flag which acquires bpf-spin-lock during value copying. The
> >>>>>>        corruption of bpf-spin-lock may result in hard lock-up.
> >>>>>> (2) lookup procedure may get incorrect map value if the found element is
> >>>>>>        freed and then reused.
> >>>>>>
> >>>>>> Because the type of htab map elements are the same, so problem #1 can be
> >>>>>> fixed by supporting ctor in bpf memory allocator. The ctor initializes
> >>>>>> these special fields in map element only when the map element is newly
> >>>>>> allocated. If it is just a reused element, there will be no
> >>>>>> reinitialization.
> >>>>>
> >>>>> Instead of adding the overhead of ctor callback let's just
> >>>>> add __GFP_ZERO to flags in __alloc().
> >>>>> That will address the issue 1 and will make bpf_mem_alloc behave just
> >>>>> like percpu_freelist, so hashmap with BPF_F_NO_PREALLOC and default
> >>>>> will behave the same way.
> >>>>
> >>>> Patch https://lore.kernel.org/all/20220809213033.24147-3-memxor@xxxxxxxxx/
> >>>> tried to address a similar issue for lru hash table.
> >>>> Maybe we need to do similar things after bpf_mem_cache_alloc() for
> >>>> hash table?
> >>> IMO ctor or __GFP_ZERO will fix the issue. Did I miss something here ?
> >>
> >> The following is my understanding:
> >> in function alloc_htab_elem() (hashtab.c), we have
> >>
> >>                  if (is_map_full(htab))
> >>                          if (!old_elem)
> >>                                  /* when map is full and update() is replacing
> >>                                   * old element, it's ok to allocate, since
> >>                                   * old element will be freed immediately.
> >>                                   * Otherwise return an error
> >>                                   */
> >>                                  return ERR_PTR(-E2BIG);
> >>                  inc_elem_count(htab);
> >>                  l_new = bpf_mem_cache_alloc(&htab->ma);
> >>                  if (!l_new) {
> >>                          l_new = ERR_PTR(-ENOMEM);
> >>                          goto dec_count;
> >>                  }
> >>                  check_and_init_map_value(&htab->map,
> >>                                           l_new->key + round_up(key_size, 8));
> >>
> >> In the above check_and_init_map_value() intends to do initializing
> >> for an element from bpf_mem_cache_alloc (could be reused from the free list).
> >>
> >> The check_and_init_map_value() looks like below (in include/linux/bpf.h)
> >>
> >> static inline void bpf_obj_init(const struct btf_field_offs *foffs, void *obj)
> >> {
> >>          int i;
> >>
> >>          if (!foffs)
> >>                  return;
> >>          for (i = 0; i < foffs->cnt; i++)
> >>                  memset(obj + foffs->field_off[i], 0, foffs->field_sz[i]);
> >> }
> >>
> >> static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
> >> {
> >>          bpf_obj_init(map->field_offs, dst);
> >> }
> >>
> >> IIUC, bpf_obj_init() will bzero those fields like spin_lock, timer,
> >> list_head, list_node, etc.
> >>
> >> This is the problem for above problem #1.
> >> Maybe I missed something?
> > Yes. It is the problem patch #1 tries to fix exactly. Patch #1 tries to fix the
> > problem by only calling check_and_init_map_value() once for the newly-allocated
> > element, so if a freed element is reused, its special fields will not be zeroed
> > again. Is there any other cases which are not covered by the solution or any
> > other similar problems in hash-tab ?
>
> No, I checked all cases of check_and_init_map_value() and didn't find
> any other instances.

check_and_init_map_value() is called in two other cases:
lookup_and_delete[_batch].
There the zeroing of the fields is necessary because the 'value'
is a temp buffer that is going to be copied to user space.
I think the way forward is to add GFP_ZERO to mem_alloc
(to make it equivalent to prealloc), remove one case
of check_and_init_map_value from hashmap, add short comments
to two other cases and add a big comment to check_and_init_map_value()
that should say that 'dst' must be a temp buffer and should not
point to memory that could be used in parallel by a bpf prog.
It feels like we've dealt with this issue a couple times already
and keep repeating this mistake, so the more comments the better.



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux