Re: [PATCH] bpf: Fix percpu address space issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 9, 2024 at 10:28 AM Eduard Zingerman <eddyz87@xxxxxxxxx> wrote:
>
> On Sun, 2024-08-04 at 20:55 +0200, Uros Bizjak wrote:
>
> [...]
>
> > Found by GCC's named address space checks.
>
> Please provide some additional details.
> I assume that the definition of __percpu was changed from
> __attribute__((btf_type_tag(percpu))) to
> __attribute__((address_space(??)), is that correct?

This is correct. The fixes in the patch are based on the patch series
[1] that enable strict percpu check via GCC's x86 named address space
qualifiers, and in its RFC state hacks __seg_gs into the __percpu
qualifier (as can be seen in the 3/3 patch). The compiler will detect
pointer address space mismatches for e.g.:

--cut here--
int __seg_gs m;

int *foo (void) { return &m; }
--cut here--

v.c: In function ‘foo’:
v.c:5:26: error: return from pointer to non-enclosed address space
   5 | int *foo (void) { return &m; }
     |                          ^~
v.c:5:26: note: expected ‘int *’ but pointer is of type ‘__seg_gs int *’

and expects explicit casts via uintptr_t when these casts are really
intended ([2], please also see [3] for similar sparse requirement):

int *foo (void) { return (int *)(uintptr_t)&m; }

[1] https://lore.kernel.org/lkml/20240805184012.358023-1-ubizjak@xxxxxxxxx/
[2] https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html#x86-Named-Address-Spaces
[3] https://sparse.docs.kernel.org/en/latest/annotations.html#address-space-name

For reference, I have attached a test source file with more complex
checks that also includes linux percpu verifier macro.

> What is the motivation for this patch?

With the percpu checker patch applied, the build will break with the
above error due to mismatched address space pointers.

> Currently __percpu is defined as a type tag and is used only by BPF verifier,
> where it seems to be relevant only for structure fields and function parameters.
> This patch only changes local variables.

__percpu is also used for sparse checks (Use C=1 CHECK="sparse
-Wcast-from-as"). Sparse also reports address space violations, but
its warnings are not fatal and are not as precise as the compiler. So,
if you try to compile the bpf source when the kernel source is patched
with the percpu checker, the compiler will report several errors that
my bpf patch tries to fix.

> > There were no changes in the resulting object files.
> >
> > [1] https://sparse.docs.kernel.org/en/latest/annotations.html#address-space-name
> >
> > Signed-off-by: Uros Bizjak <ubizjak@xxxxxxxxx>
> > Cc: Alexei Starovoitov <ast@xxxxxxxxxx>
> > Cc: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
> > Cc: Andrii Nakryiko <andrii@xxxxxxxxxx>
> > Cc: Martin KaFai Lau <martin.lau@xxxxxxxxx>
> > Cc: Eduard Zingerman <eddyz87@xxxxxxxxx>
> > Cc: Song Liu <song@xxxxxxxxxx>
> > Cc: Yonghong Song <yonghong.song@xxxxxxxxx>
> > Cc: John Fastabend <john.fastabend@xxxxxxxxx>
> > Cc: KP Singh <kpsingh@xxxxxxxxxx>
> > Cc: Stanislav Fomichev <sdf@xxxxxxxxxxx>
> > Cc: Hao Luo <haoluo@xxxxxxxxxx>
> > Cc: Jiri Olsa <jolsa@xxxxxxxxxx>
> > ---
> >  kernel/bpf/arraymap.c |  8 ++++----
> >  kernel/bpf/hashtab.c  |  8 ++++----
> >  kernel/bpf/helpers.c  |  4 ++--
> >  kernel/bpf/memalloc.c | 12 ++++++------
> >  4 files changed, 16 insertions(+), 16 deletions(-)
> >
> > diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> > index 188e3c2effb2..544ca433275e 100644
> > --- a/kernel/bpf/arraymap.c
> > +++ b/kernel/bpf/arraymap.c
> > @@ -600,7 +600,7 @@ static void *bpf_array_map_seq_start(struct seq_file *seq, loff_t *pos)
> >       array = container_of(map, struct bpf_array, map);
> >       index = info->index & array->index_mask;
> >       if (info->percpu_value_buf)
> > -            return array->pptrs[index];
> > +            return array->ptrs[index];
>
> I disagree with this change.
> One might say that indeed the address space is cast away here,
> however, value returned by this function is only used in functions
> bpf_array_map_seq_{next,show,stop}(), where it is guarded by the same
> 'if (info->percpu_value_buf)' condition to identify if per_cpu_ptr()
> is necessary.

If this is the case, you have to inform the compiler that address
space is cast away with explicit (void *)(uintptr_t) cast, placed
before return. But looking at the union with ptrs and pptrs members,
it looked to me that it is just the case of wrong union member
accessed.

> >       return array_map_elem_ptr(array, index);
> >  }
> >
> > @@ -619,7 +619,7 @@ static void *bpf_array_map_seq_next(struct seq_file *seq, void *v, loff_t *pos)
> >       array = container_of(map, struct bpf_array, map);
> >       index = info->index & array->index_mask;
> >       if (info->percpu_value_buf)
> > -            return array->pptrs[index];
> > +            return array->ptrs[index];
>
> Same as above.
>
> >       return array_map_elem_ptr(array, index);
> >  }
> >
> > @@ -632,7 +632,7 @@ static int __bpf_array_map_seq_show(struct seq_file *seq, void *v)
> >       struct bpf_iter_meta meta;
> >       struct bpf_prog *prog;
> >       int off = 0, cpu = 0;
> > -     void __percpu **pptr;
> > +     void * __percpu *pptr;
>
> Should this be 'void __percpu *pptr;?
> The value comes from array->pptrs[*] field,
> which has the above type for elements.

I didn't want to introduce semantic changes, so I have just changed
the base type fo __percpu one, due to:

per_cpu_ptr(pptr, cpu));

later in the code.

>
> >       u32 size;
> >
> >       meta.seq = seq;
> > @@ -648,7 +648,7 @@ static int __bpf_array_map_seq_show(struct seq_file *seq, void *v)
> >               if (!info->percpu_value_buf) {
> >                       ctx.value = v;
> >               } else {
> > -                     pptr = v;
> > +                     pptr = (void __percpu *)(uintptr_t)v;
> >                       size = array->elem_size;
> >                       for_each_possible_cpu(cpu) {
> >                               copy_map_value_long(map, info->percpu_value_buf + off,
> > diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> > index be1f64c20125..a49212bbda09 100644
> > --- a/kernel/bpf/hashtab.c
> > +++ b/kernel/bpf/hashtab.c
> > @@ -1049,14 +1049,14 @@ static struct htab_elem *alloc_htab_elem(struct bpf_htab *htab, void *key,
> >                       pptr = htab_elem_get_ptr(l_new, key_size);
> >               } else {
> >                       /* alloc_percpu zero-fills */
> > -                     pptr = bpf_mem_cache_alloc(&htab->pcpu_ma);
> > -                     if (!pptr) {
> > +                     void *ptr = bpf_mem_cache_alloc(&htab->pcpu_ma);
> > +                     if (!ptr) {
>
> Why adding an intermediate variable here?

Mainly to avoid several inter-as casts, because l_new->ptr_to_pptr
also expects assignment from generic address space.

> Is casting bpf_mem_cache_alloc() result to percpu not sufficient?
> It looks like bpf_mem_cache_alloc() returns a percpu pointer,
> should it be declared as such?

This function is also used a couple of lines above changed hunk:

        l_new = bpf_mem_cache_alloc(&htab->ma);

where l_new is declared in generic address space.

>
> >                               bpf_mem_cache_free(&htab->ma, l_new);
> >                               l_new = ERR_PTR(-ENOMEM);
> >                               goto dec_count;
> >                       }
> > -                     l_new->ptr_to_pptr = pptr;
> > -                     pptr = *(void **)pptr;
> > +                     l_new->ptr_to_pptr = ptr;
> > +                     pptr = *(void __percpu **)ptr;
> >               }
> >
> >               pcpu_init_value(htab, pptr, value, onallcpus);
>
> [...]
>
> > diff --git a/kernel/bpf/memalloc.c b/kernel/bpf/memalloc.c
> > index dec892ded031..b3858a76e0b3 100644
> > --- a/kernel/bpf/memalloc.c
> > +++ b/kernel/bpf/memalloc.c
> > @@ -138,8 +138,8 @@ static struct llist_node notrace *__llist_del_first(struct llist_head *head)
> >  static void *__alloc(struct bpf_mem_cache *c, int node, gfp_t flags)
> >  {
> >       if (c->percpu_size) {
> > -             void **obj = kmalloc_node(c->percpu_size, flags, node);
> > -             void *pptr = __alloc_percpu_gfp(c->unit_size, 8, flags);
> > +             void __percpu **obj = kmalloc_node(c->percpu_size, flags, node);
>
> Why __percpu is needed for obj?

The new declaration declares "void pointer to percpu pointer", it is
needed because some lines below we have:

        obj[1] = pptr;

and pptr needs to be declared in __percpu named address space due to:

    free_percpu(pptr);

> kmalloc_node is defined as 'alloc_hooks(kmalloc_node_noprof(__VA_ARGS__))',
> alloc_hooks(X) is a macro and it produces result of type typeof(X),
> kmalloc_node_noprof() returns void*, not __percpu void*.
> Do I miss something?

Yes, as explained above, the declaration:

void __percpu **obj

somehow unintuitively declares void pointer to percpu pointer, so the
types match (otherwise, the compiler would complain ;) ).

Thanks,
Uros.
#define NULL 0

#define __verify_pcpu_ptr(ptr)						\
do {									\
	const void __seg_gs *__vpp_verify = (typeof((ptr) + 0))NULL;	\
	(void)__vpp_verify;						\
} while (0)

void __seg_gs **__pptr; // void pointer na void __seg_gs pointer

void *foo1 (void *v)
{
  void __seg_gs **pptr = v;
  __verify_pcpu_ptr (*pptr);
  return pptr;
}

void __seg_gs *foo2 (void *v)
{
  void __seg_gs **pptr = v;
  __verify_pcpu_ptr (*pptr);
  return *pptr;
}

void * __seg_gs *__ptr; // __seg_gs void pointer to void pointer

void __seg_gs *bar1 (void __seg_gs *v)
{
  void * __seg_gs *ptr = v;
  __verify_pcpu_ptr (ptr);
  return ptr;
}

void *bar2 (void __seg_gs *v)
{
  void * __seg_gs *ptr = v;
  __verify_pcpu_ptr (ptr);
  return *ptr;
}

void __seg_gs *qux (void *ptr)
{
  return *(void __seg_gs **)ptr;
}

void __seg_gs *quux (void *ptr)
{
  return ((void __seg_gs **)ptr)[1];
}

void __seg_gs *quuux (void __seg_gs **ptr)
{
  return *ptr;
}

void __seg_gs *test (void *v)
{
  void __seg_gs **pptr = v;

  return *pptr;
}

void *test_ (void *v)
{
  void __seg_gs **pptr = v;

  return pptr;
}

[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux