Follow up from the btf_type_tag discussion in the BPF office hours

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Of the two problems discussed:

1. DW_TAG_LLVM_annotation not being able to denote annotations to
   non-pointed based types.  clang currently ignores these instances.

   We discussed two possible options to deal with this:
   1.1 To continue ignoring these cases in the front-end, keep the dwarf
       expressiveness limitation, and document it.
   1.2 To change DW_TAG_LLVM_annotation so it behaves like a qualifier
       DIE (like const, volatile, etc.) so it can apply to any type.

2. The ordering problem: sparse annotations order differently than
   GNU/C99 compiler attributes.  Therefore translating 1-to-1 from
   sparse annotations to compiler attributes results in attributes with
   different syntax than normal compiler attributes.

   This was accepted in clang.
   But found resistance in GCC when we sent the first patch series.

   During the meeting we went thru several possible ways of dealing with
   this problem, but we didn't reach any conclusion on what to do, since
   the time ran out.

We agreed to continue the discussion at the BPF office hours next 5
January 2023.

In the meanwhile, below in this email is a slightly updated version of
the material used to go thru the topics during the discussion.  If there
is any mistake or if you see that our understanding of the
problem/situation is not correct, please point it out.  If you want to
add more information, please do so by replying to this thread.

Finally, it was agreed that we (GCC BPF hackers) would send Yonghong our
github accounts so he can subscribe us to notifications in the llvm
phabricator, so we can be aware of potentially ABI/breaking changes at
the time they are discussed, and not afterwards scanning bpf@vger.  I
alredy sent him the information.

Thank you for your time today.  It is appreciated.

----

* Background
** Attribute/annotation ordering in declarations

   - By "ordering" in this context we understand the correspondence of
     annotations (like compiler attributes) with language entities.

     Given:

     : char * __attribute ((btf_type_tag("foo"))) * buf;

     What is the language entity (type) to which the attribute
     applies?  Is it char?  Is it char*?  Is it char**?

   - This is decided by compilers at parse-time and reflected in the
     structure of the AST.

   - Therefore this is a source-level concept, and it is not amenable
     by generating debug info with a different ordering.

** Compilers implement different orderings for different kinds of attributes

   - The sparse compiler implements a particular ordering for sparse
     annotations, which look like:

     : __attribute__ ((address_space(user)))

     The ordering is not documented, but can be determined by running
     sparse and/or by looking at the sparse parser source code.

     [The fact sparse annotations look like compiler GNU-like
      attributes must not mislead us: they are not the same and they
      don't order the same way.]

   - The GCC and clang compilers implement another particular ordering
     for GNU-like compiler attributes, which have the form:

     : __attribute__ ((noinline))

     The ordering is specified in the "Attribute Syntax" chapter in
     the GCC manual.

   - The GCC and clang compilers implement another particular ordering
     for C2x compiler attributes, which have the form:

     : [[noinline]]

     The ordering is specified in the C2x specification.

   All three entities above, sparse annotations, GNU-like compiler
   attributes and C2x compiler attributes use different orderings,
   i.e. the correspondence of the annotation/attribute with the
   language entity it annotates (type, declaration, ...) varies.

* The problem
** Re-use of sparse annotations as compiler attributes

  - At some point we got the request from the BPF community to support
    a couple of new compiler attributes in GCC:

    : btf_type_tag ("foobar")
    : btf_decl_tag ("foobar")

    These attributes translate into annotations in both BTF and DWARF
    debugging information.

    clang/llvm had already been modified to support these attributes,
    the new BTF kind, and a new DW_TAG_LLVM_annotation DWARF DIE type.

  - David Faust wrote an implementation, and used as a criteria the
    several examples he saw in the kernel sources and what clang
    generates.  He added the same support in BTF and a compatible
    DW_TAG_GNU_annotation for DWARF.

    He obviously implemented the new compiler attributes as what they
    are: GNU-like compiler attributes, and saw that they were being
    ordered differently to what the kernel sources would expect.

  - Then we investigated and learned that the intention at the BPF
    side was to re-use the already existing address_space sparse
    annotations in the kernel sources as btf_*_tag compiler
    attributes.

  - At present, it looks like support for several other sparse-like
    attributes have been added to clang (which ones?) and as a result
    now clang effectively has GNU-like compiler attributes that order
    differently than all other attributes.

  - At LPC 2022 we met David Malcolm, who has been working in
    implementing sparse annotations and warnings natively in GCC.  He
    found similar problems with the ordering of the attributes.

** Scope of the problem

   - This happens when sparse annotations are re-used as compiler
     attributes.

   - sparse annotations are NOT compiler attributes => there is NO bug
     in sparse.

   - These annotations are used very widely in the Linux kernel sources.

   - Compilers involved: sparse, GCC and clang

   - sparse annotations can appear in both BPF C programs and kernel
     source files.

** Ordering discrepancies between sparse annotations and compiler attributes

   Using these definitions:

   : #ifdef __CHECKER__
   : #define __tag1 __attribute__((noderef, address_space(__tag1)))
   : #define __tag2 __attribute__((noderef, address_space(__tag2)))
   : #else
   : #define __tag1 __attribute__((btf_type_tag("tag1")))
   : #define __tag2 __attribute__((btf_type_tag("tag2")))
   : #endif

   __CHECKER__ is defined by sparse, so the former definitions are
   used * while compiling the examples with sparse.

   The latter definitions with btf_type_tag are used when compiling
   with GCC or clang.

   Compilers compared:

   - sparse

     + Ordering information is obtained via sparse debug printed as
       part of a warning.

   - GCC

     + Slightly outdated 'master' branch, with the patches to support
       btf_type_tag and btf_decl_tag applied:
       https://gcc.gnu.org/git/gcc.git:refs/users/dfaust/heads/btf-type-tag-new-rebase

     + Ordering information is obtained via the generated BTF and
       DWARF.  Note that we have verified that the BTF and DWARF
       results are accurately representative of the GCC internal
       compiler structures which produce it, and that therefore they
       reflect how the compiler has parsed the declarations.

   - clang

     + clang version 16.0.0. main branch of December 14 2022.
     + Ordering information is obtained via the generated BTF and
       DWARF.

    Probably the simplest case involving pointers, shows how both
    compilers are mangling the generated BTF in order to reflect the
    sparse expectations (presumably the same hack is in pahole when
    it does the DWARF -> BTF translation):

    : char __tag1 * buf;

    - sparse
      + __tag1 applies to type char.
      : got char [noderef] __tag1 *[addressable] [toplevel] buf
    - clang
      + According to DWARF __tag1 applies to type char
      + According to BTF  __tag1 applies to type char
      : DWARF
      : 0x00000023:   DW_TAG_variable
      :                 DW_AT_name	("buf")
      :                 DW_AT_type	(0x0000002e "char *")
      :
      : 0x0000002e:   DW_TAG_pointer_type
      :                 DW_AT_type	(0x00000037 "char")
      :
      : 0x00000033:     DW_TAG_LLVM_annotation
      :                 DW_AT_name	("btf_type_tag")
      :                 DW_AT_const_value	("tag1")
      : BTF
      : [1] TYPE_TAG 'tag1' type_id=3
      : [2] PTR '(anon)' type_id=1
      : [3] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=SIGNED
      : [4] VAR 'buf' type_id=2, linkage=global
      :
      : buf -> ptr -> 'tag1' -> char
    - GCC
      + According to DWARF __tag1 applies to type char
      + According to BTF  __tag1 applies to type char
      : DWARF
      : 0x0000002e:   DW_TAG_variable
      :                 DW_AT_name	("buf")
      :                 DW_AT_type	(0x00000044 "char *")
      :
      : 0x00000044:   DW_TAG_pointer_type
      :                 DW_AT_type	(0x00000057 "char")
      :
      : 0x0000004d:     DW_TAG_LLVM_annotation
      :                 DW_AT_name	("btf_type_tag")
      :                 DW_AT_const_value	("tag1")
      :
      : BTF
      : [1] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=SIGNED
      : [2] PTR '(anon)' type_id=3
      : [3] TYPE_TAG 'tag1' type_id=1
      : [4] VAR 'buf' type_id=2, linkage=global
      :
      : buf -> ptr -> 'tag1' -> char

    The next example is from from bpf-next/arch/x86/kernel/kgdb.c:184
    and shows cases where both the DWARF and the BTF are different:

    : struct perf_event * __percpu *pev;

    - sparse
      + __percpu applies to struct perf_event*
      : got struct perf_event *[noderef] __percpu *[addressable] [toplevel] pev
    - clang
      + According to DWARF __percpu applies to struct perf_event*
      + According to BTF __percpu applies to struct perf_event*
      : DWARF
      : 0x00000023:   DW_TAG_variable
      :                 DW_AT_name	("pev")
      :                 DW_AT_type	(0x0000002e "perf_event **")
      :
      : 0x0000002e:   DW_TAG_pointer_type
      :                 DW_AT_type	(0x00000037 "perf_event *")
      :
      : 0x00000033:     DW_TAG_LLVM_annotation
      :                 DW_AT_name	("btf_type_tag")
      :                 DW_AT_const_value	("percpu")
      :
      : 0x00000037:   DW_TAG_pointer_type
      :                 DW_AT_type	(0x0000003c "perf_event")
      : BTF
      : [2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
      : [4] PTR '(anon)' type_id=5
      : [5] STRUCT 'perf_event' size=4 vlen=1
      : 'a' type_id=2 bits_offset=0
      : [8] TYPE_TAG 'percpu' type_id=4
      : [9] PTR '(anon)' type_id=8
      : [13] VAR 'pev' type_id=9, linkage=global
      :
      : pev -> ptr -> 'percpu' -> ptr -> struct perf_event
    - GCC
      + According to DWARF __percpu applies to struct perf_event
      + According to BTF __percpu applies to struct perf_event
      : DWARF
      : 0x00000023:   DW_TAG_variable
      :                 DW_AT_name	("pev")
      :                 DW_AT_type	(0x00000039 "perf_event **")
      :
      : 0x00000039:   DW_TAG_pointer_type (perf_event **)
      :                 DW_AT_type	(0x0000003f "perf_event *")
      :
      : 0x0000003f:   DW_TAG_pointer_type (perf_event *)
      :                 DW_AT_type	(0x0000001e "perf_event")
      :
      : 0x00000045:     DW_TAG_LLVM_annotation
      :                 DW_AT_name	("btf_type_tag")
      :                 DW_AT_const_value	("percpu")
      : BTF
      : [1] STRUCT 'perf_event' size=4 vlen=1
      : 'a' type_id=2 bits_offset=0
      : [2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
      : [3] PTR '(anon)' type_id=4
      : [4] TYPE_TAG 'percpu' type_id=1
      : [5] PTR '(anon)' type_id=3
      : [14] VAR 'pev' type_id=5, linkage=global
      :
      : pev -> ptr -> ptr -> 'percpu' -> perf_event

    The following case shows two different attributes applied to
    pointers-to-pointers:

    : int __tag1 * __tag2 * g;

    - sparse
      + __tag1 applies to int, __tag2 applies to int*
      : got int [noderef] __tag1 *[noderef] __tag2 *[addressable] [toplevel] g
    - clang
      + According to DWARF __tag1 applies to int, __tag2 applies to int*
      + According to BTF __tag1 applies to int, __tag2 applies to int*
      : DWARF
      : 0x00000023:   DW_TAG_variable
      :                 DW_AT_name	("g")
      :                 DW_AT_type	(0x0000002e "int **")
      :
      : 0x0000002e:   DW_TAG_pointer_type
      :                 DW_AT_type	(0x00000037 "int *")
      :
      : 0x00000033:     DW_TAG_LLVM_annotation
      :                 DW_AT_name	("btf_type_tag")
      :                 DW_AT_const_value	("tag2")
      :
      : 0x00000037:   DW_TAG_pointer_type
      :                 DW_AT_type	(0x00000040 "int")
      :
      : 0x0000003c:     DW_TAG_LLVM_annotation
      :                 DW_AT_name	("btf_type_tag")
      :                 DW_AT_const_value	("tag1")
      : BTF
      : [1] TYPE_TAG 'tag2' type_id=4
      : [2] PTR '(anon)' type_id=1
      : [3] TYPE_TAG 'tag1' type_id=5
      : [4] PTR '(anon)' type_id=3
      : [5] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
      : [6] VAR 'g' type_id=2, linkage=global
      :
      : 'g' -> ptr -> 'tag2' -> ptr -> 'tag1' -> int
    - GCC
      + According to DWARF __tag1 applies to int*, __tag2 applies to int
      + According to BTF __tag1 applies to int*, __tag2 applies to int
      : DWARF
      : 0x0000002e:   DW_TAG_variable
      :                 DW_AT_name	("g")
      :                 DW_AT_type	(0x00000042 "int **")
      :
      : 0x00000042:   DW_TAG_pointer_type
      :                 DW_AT_type	(0x00000055 "int *")
      :
      : 0x0000004b:     DW_TAG_LLVM_annotation
      :                 DW_AT_name	("btf_type_tag")
      :                 DW_AT_const_value	("tag1")
      :
      : 0x00000055:   DW_TAG_pointer_type
      :                 DW_AT_type	(0x00000068 "int")
      :
      : 0x0000005e:     DW_TAG_LLVM_annotation
      :                 DW_AT_name	("btf_type_tag")
      :                 DW_AT_const_value	("tag2")
      : BTF
      : [1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
      : [2] PTR '(anon)' type_id=3
      : [3] TYPE_TAG 'tag2' type_id=1
      : [4] PTR '(anon)' type_id=5
      : [5] TYPE_TAG 'tag1' type_id=2
      : [6] VAR 'g' type_id=4, linkage=global
      :
      : 'g' -> ptr -> 'tag1' -> ptr -> 'tag2' -> int

    What follows is a typical kernel example.  We get "right" results
    in GCC just because both tags are the same attributes, but
    effectively we hit the same problem than in the previous example:

    : int __tag1 * __tag1 * val;

    - sparse
      + __tag1 applies to both int and int*
      : got int [noderef] __tag1 *[noderef] __tag1 *[addressable] [toplevel] val
    - clang
      + According to DWARF __tag1 applies to both int and int*
      + According to BTF __tag1 applies to both int and int*
      : DWARF
      : 0x0000001e:   DW_TAG_variable
      :                 DW_AT_name	("val")
      :                 DW_AT_type	(0x00000029 "int **")
      :
      : 0x00000029:   DW_TAG_pointer_type
      :                 DW_AT_type	(0x00000032 "int *")
      :
      : 0x0000002e:     DW_TAG_LLVM_annotation
      :                 DW_AT_name	("btf_type_tag")
      :                 DW_AT_const_value	("tag1")
      :
      : 0x00000032:   DW_TAG_pointer_type
      :                 DW_AT_type	(0x0000003b "int")
      :
      : 0x00000037:     DW_TAG_LLVM_annotation
      :                 DW_AT_name	("btf_type_tag")
      :                 DW_AT_const_value	("tag1")
      : BTF
      : [1] TYPE_TAG 'tag1' type_id=4
      : [2] PTR '(anon)' type_id=1
      : [3] TYPE_TAG 'tag1' type_id=5
      : [4] PTR '(anon)' type_id=3
      : [5] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
      : [6] VAR 'val' type_id=2, linkage=global
      :
      : 'val' -> ptr -> 'tag1' -> ptr -> 'tag1' -> int
    - GCC
      + According to DWARF __tag1 applies to both int and int*
      + According to BTF __tag1 applies to both int and int*
      : DWARF
      : 0x0000001e:   DW_TAG_variable
      :                 DW_AT_name	("val")
      :                 DW_AT_type	(0x00000034 "int **")
      :
      : 0x00000034:   DW_TAG_pointer_type
      :                 DW_AT_type	(0x00000047 "int *")
      :
      : 0x0000003d:     DW_TAG_LLVM_annotation
      :                 DW_AT_name	("btf_type_tag")
      :                 DW_AT_const_value	("tag1")
      :
      : 0x00000047:   DW_TAG_pointer_type
      :                 DW_AT_type	(0x0000005a "int")
      :
      : 0x00000050:     DW_TAG_LLVM_annotation
      :                 DW_AT_name	("btf_type_tag")
      :                 DW_AT_const_value	("tag1")
      : BTF
      : [1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
      : [2] PTR '(anon)' type_id=3
      : [3] TYPE_TAG 'tag1' type_id=1
      : [4] PTR '(anon)' type_id=5
      : [5] TYPE_TAG 'tag1' type_id=2
      : [6] VAR 'val' type_id=4, linkage=global
      :
      : 'val' -> ptr -> 'tag1' -> ptr -> 'tag1' -> int

    This last example is another variation of pointer-to-pointer with
    different tags and different positions:

    : int * __tag1 * __tag2 h;

    - sparse
      +  __tag1 applies to int*, __tag2 applies to int**
      : got int *[noderef] __tag1 *[addressable] [noderef] [toplevel] __tag2 h
    - clang
      + According to DWARF __tag1 applies to int*, no __tag2 (??).
      + According to BTF  __tag1 applies to int*, no __tag2 (??).
      : DWARF
      : 0x00000023:   DW_TAG_variable
      :                 DW_AT_name	("h")
      :                 DW_AT_type	(0x0000002e "int **")
      :
      : 0x0000002e:   DW_TAG_pointer_type
      :                 DW_AT_type	(0x00000037 "int *")
      :
      : 0x00000033:     DW_TAG_LLVM_annotation
      :                 DW_AT_name	("btf_type_tag")
      :                 DW_AT_const_value	("tag1")
      : BTF
      : [1] TYPE_TAG 'tag1' type_id=3
      : [2] PTR '(anon)' type_id=1
      : [3] PTR '(anon)' type_id=4
      : [4] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
      : [5] VAR 'h' type_id=2, linkage=global
      :
      : 'h' -> ptr -> 'tag1' -> ptr -> int
    - GCC
      + According to DWARF __tag1 applies to int*, __tag2 applies to int**
      + According to BTF __tag1 applies to int, __tag2 applies to int*
      : DWARF
      : 0x0000002e:   DW_TAG_variable
      :                 DW_AT_name	("h")
      :                 DW_AT_type	(0x00000042 "int **")
      :
      : 0x00000042:   DW_TAG_pointer_type
      :                 DW_AT_type	(0x00000055 "int *")
      :
      : 0x0000004b:     DW_TAG_LLVM_annotation
      :                 DW_AT_name	("btf_type_tag")
      :                 DW_AT_const_value	("tag2")
      :
      : 0x00000055:   DW_TAG_pointer_type
      :                 DW_AT_type	(0x00000068 "int")
      :
      : 0x0000005e:     DW_TAG_LLVM_annotation
      :                 DW_AT_name	("btf_type_tag")
      :                 DW_AT_const_value	("tag1")
      : BTF
      : [1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
      : [2] PTR '(anon)' type_id=3
      : [3] TYPE_TAG 'tag1' type_id=1
      : [4] PTR '(anon)' type_id=5
      : [5] TYPE_TAG 'tag2' type_id=2
      : [6] VAR 'h' type_id=4, linkage=global
      :
      : 'h' -> ptr -> 'tag2' -> ptr -> 'tag1' -> int

* Solutions
** Reorder sparse annotations in kernel source

   [Julia Jawall kindly adapted coccinelle so it can handle this and
    the following possible solution involving modifying kernel
    sources.]

   This would basically involve writing and applying a coccinelle
   script to reorder something like this:

   : int __tag1 * __tag2 * g;

   into this

   : int __tag2 * __tag1 * g;

   where

   : #define __tag1 __attribute__((address_space(__tag1)))

   This totally breaks sparse so it is hardly a realistic solution.

** Add compiler attributes corresponding to existing sparse annotations to the kernel source

   This would involve writing and applying a coccinelle script to add
   compiler attributes corresponding to the existing sparse
   annotations.

   So from this:

   : int __tag1 * __tag2 * g;

   We would go to:

   : int __tag1 __comp_tag2 * __tag2 __comp_tag1 * g;

   where

   : #if __CHECKER__
   : #  define __tag1 __attribute__((address_space(__tag1)))
   : #else
   : #  define __tag1
   : #endif
   : #define __comp_tag1 __attribute__((btf_type_tag("tag1")))

   This would not break sparse, but would lead to a lot of (non
   trivial) redundancy in the kernel sources: both sparse annotations
   and compiler attributes will denote the same thing.

** Adding native support for sparse annotations to GCC and clang

   This would make both GCC and clang to handle sparse annotations
   using the sparse ordering.  This is what David Malcolm has been
   working on.

   There is some previous work From Tom Tromey to support the
   following sparse annotations: address_space, noderef and force.
   See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59850 (But AFAIK
   Tom's patch also falls in the mistake of ordering the attributes
   using GNU-like attribute ordering, not the sparse ordering?)

   For example, in this C declaration:

   : int * __attribute__((address_space(__tag1))) * p;

   The compilers would recognize that as a sparse annotation (and not
   a GNU-like compiler attribute) and parse it in a way it applies to
   int* instead of int**.

   At least two problems with this approach:

   1. This could be confusing for users, and difficult to get applied
      upstream, because sparse annotations look like GNU-like compiler
      attributes, despite not being the same thing (ordering
      differently.)

   2. sparse lacks support for btf_type_tag and btf_decl_tag, and
      these compiler attributes are more general than the existing
      sparse annotations.

** Adding GNU-like compiler attributes to GCC and clang with sparse ordering

   This is what clang does right now (or tries to do).

   Solves problem 2. above, but not 1.

   Maybe problem 1. could be alleviated by oddly-ordering compiler
   attributes to have a distinguished name, such as using a 'sparse'
   prefix:

   : __attribute__((sparse_btf_type_tag ("tag1")))

** Other solutions?



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux