BTF tag support in DWARF (notes for today's BPF Office Hours)

"Jose E. Marchesi" <jose.marchesi@xxxxxxxxxx> · Thu, 05 Jan 2023 12:37:57 +0100

Hello all.

Find below the notes we intend to use in today's BPF office hour to
discuss possible solutions for the current limitations in the DWARF
representation of the btf_type_tag C attributes, and hopefully decide on
one so we can move forward with this.

The list of suggested solutions below is of course not closed: these are
just the ones we could think about.  Better alternatives and suggestions
are very welcome!

BTF tag support in DWARF

* Current situation: annotations as children DIEs for pointees

  DWARF information is structured as a tree of DIE nodes.  Nodes can
  have attributes associated to them, as well as zero or more DIE
  children.

  clang extends DWARF with a new tag (DIE type) =DW_TAG_LLVM_annotation=.
  Nodes of this type are used to associate a tag name with a tag value that
  is also a string.

  Example:

  :  DW_TAG_LLVM_annotation
  :     DW_AT_name        "btf_type_tag"
  :     DW_AT_const_value "user"

  At the moment, clang generates =DW_TAG_LLVM_annotation= nodes as children
  of =DW_TAG_pointer_type= nodes.  The intended semantic is that the
  annotation applies to the pointed-to type.

  For example (indentation reflects the parent-children tree structure):

  : DW_TAG_pointer_type
  :   DW_AT_type "int"
  :   DW_TAG_LLVM_annotation
  :     DW_AT_name        "btf_type_tag"
  :     DW_AT_const_value "tag1"

  The example above associates a "btf_type_tag->tag1" named annotation to the
  type pointed by its containing pointer_type, which is "int".

  This approach has the advantage that, since the new
  =DW_TAG_LLVM_annotation= nodes are effectively used as attributes, they are
  safely ignored by DWARF consumers that do not understand this DIE type.

  But this approach also has a big caveat: types that are not pointed-to by
  pointer types are not expressible in this design.  This obviously impacts
  simple types such as =int= but also pointer types that are not pointees
  themselves.

  For example, it is not possible to associate the tag =__tag2= to the type
  =int **= in this example (Note this is sparse/clang ordering.):

  : int * __tag1 * __tag2 h;

  - sparse
    +  __tag1 applies to int*, __tag2 applies to int**
    : got int *[noderef] __tag1 *[addressable] [noderef] [toplevel] __tag2 h
  - clang
    + According to DWARF __tag1 applies to int*, no __tag2 (??).
    + According to BTF  __tag1 applies to int*, no __tag2 (??).
    : DWARF
    : 0x00000023:   DW_TAG_variable
    :                 DW_AT_name	("h")
    :                 DW_AT_type	(0x0000002e "int **")
    :
    : 0x0000002e:   DW_TAG_pointer_type
    :                 DW_AT_type	(0x00000037 "int *")
    :
    : 0x00000033:     DW_TAG_LLVM_annotation
    :                 DW_AT_name	("btf_type_tag")
    :                 DW_AT_const_value	("tag1")
    : BTF
    : [1] TYPE_TAG 'tag1' type_id=3
    : [2] PTR '(anon)' type_id=1
    : [3] PTR '(anon)' type_id=4
    : [4] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
    : [5] VAR 'h' type_id=2, linkage=global
    :
    : 'h' -> ptr -> 'tag1' -> ptr -> int

* A note about `void'

  The DWARF specification recommends to denote the =void= C type by
  generating a DIE with =DW_TAG_unspecified_type= and name "void".

  However, both GCC and LLVM do _not_ follow this recommendation and instead
  they denote the =void= type as the absence of a =DW_AT_type= attribute in
  whatever containing node.

  Example, for a pointer to =void=:

  : 3      DW_TAG_pointer_type    [no children]

  Note also that the kernel sources have sparse annotations like:

  : void __user * data;

  Which, using sparse ordering, means that the type which is annotated is
  =void=.  Therefore it is very important to be able to tag the =void= basic
  type in this design.

  GDB and other DWARF consumers understand the spec-recommended way to denote
  =void=.

* Solution 1: annotations as qualifiers

  A possible solution for this is to handle =DW_TAG_LLVM_annotation= the same
  way than C type qualifiers are handled in DWARF: including them in the type
  chain linked by =DW_AT_type= attributes.

  For example:

  : DW_TAG_pointer_type
  :   DW_AT_type ("btf_type_tag")
  :
  : DW_TAG_LLVM_annotation
  :   DW_AT_name        "btf_type_tag"
  :   DW_AT_const_value "tag1"
  :   DW_AT_type        ("int")
  :
  : DW_TAG_base_type
  :   DW_AT_name ("int")

  Note how now the =LLVM_annotation= has the annotated type linked by
  =DW_AT_type=, and acts itself as a type linked from =DW_TAG_pointer_type=.

  Advantages of this approach:

  - It makes sense for annotations to be implemented as qualifiers, because
    they actually qualify a target type.

  - This approach is totally flexible and makes it possible to annotate any
    type, qualified or not, pointed-to or not.

  - The resulting DWARF looks like the BTF.

  - It can handle annotated `void', as currently generated by GCC and
    clang/LLVM:

    :   DW_TAG_LLVM_annotation
    :     DW_AT_name        "btf_type_tag"
    :     DW_AT_const_value "tag1"
    :     DW_AT_type NULL

  Disadvantages of this approach:

  - Implementing this is more elaborated, and it requires DWARF consumers to
    understand this new DIE type, in order to follow the type chains in the
    tree: =DW_TAG_LLVM_annotation= should now be expected in any =DW_AT_type=
    reference.

  - This breaks DWARF, making it very difficult to be implemented as a
    compiler extension, and will likely require make it part of DWARF.

  - This is not backwards compatible to what clang currently generates.

* Solution 2: annotations as children DIEs

  This approach involves keeping the =DW_TAG_LLVM_annotation= DIE, with the
  same internal structure it has now, but associating it to the type DIE that
  is its parent.  (Note this is not the same than being linked by a
  =DW_AT_type= attribute like in Solution 1.)

  This means that this DWARF tree:

  : DW_TAG_pointer_type
  :   DW_AT_type "int"
  :   DW_TAG_LLVM_annotation
  :     DW_AT_name        "btf_type_tag"
  :     DW_AT_const_value "tag1"

  Denotes an annotation that applies to the type =int*=, not the pointee type
  =int=.

  Advantages of this approach:

  - This approach makes it possible to annotate any type, qualified or not,
    pointed-to or not.

  - This can easily be implemented as a compiler extension, because existing
    DWARF consumers will happily ignore the new attributes in case they don't
    support them;  the type chains in the tree remain the same.

  - Easy to implement in GCC.

  Disadvantages of this approach:

  - This may result in an increased number of type nodes in the tree.  For
    example, we may have a tagged =int*= and a non-tagged =int*=, which now
    will have to be implemented using two different DIEs.

  - This is not backwards-compatible to what clang currently generates, in
    the case of pointer types.

  - It cannot handle annotated `void' as currently generated by GCC and
    clang/LLVM, so for tagged =void= we would need to generate unspecified
    types with name "void":

    : DW_TAG_unspecified_type
    :   DW_AT_name "void"
    :   DW_TAG_LLVM_annotation
    :     DW_AT_name        "btf_type_tag"
    :     DW_AT_const_value "tag1"

    But this should be supported by DWARF consumers, as per the DWARF spec,
    and it is certainly recognized by GDB.

* Solution 3a: annotations as set of attributes

  Another possible solution is to extend DWARF with a pair of two new
  attributes =DW_AT_annotation_tag= and =DW_AT_annotation_value=.

  Annotated types will have these attributes defined.  Example:

  : DW_TAG_pointer_type
  :   DW_AT_type "int"
  :   DW_AT_annotation_tag   "btf_type_tag"
  :   DW_AT_annotation_value "tag1"

  Note that in this example the tag applies to the pointer type, not the
  pointee, i.e. to =int*=.

  Advantages of this approach:

  - This can easily be implemented as a compiler extension, because existing
    DWARF consumers will happily ignore the new attributes in case they don't
    support them;  the type chains in the tree remain the same.

  - This is backwards compatible to what clang currently generates.

  - Easy to implement in GCC.

  Disadvantages of this approach:

  - This may result in an increased number of type nodes in the tree.  For
    example, we may have a tagged =int*= and a non-tagged =int*=, which now
    will have to be implemented using two different DIEs.

  - It cannot handle annotated `void' as currently generated by GCC and
    clang/LLVM, so for tagged =void= we would need to generate unspecified
    types with name "void":

    : DW_TAG_unspecified_type
    :   DW_AT_name "void"
    :   DW_AT_annotation_tag   "btf_type_tag"
    :   DW_AT_annotation_value "tag1"

    But this should be supported by DWARF consumers, as per the DWARF spec,
    and it is certainly recognized by GDB.

* Solution 3b: annotations as single "structured" attributes

  This is like 3a, but using a single attribute =DW_AT_annotation= instead of
  two, and encoding the tag name and the tag value in the string value using
  some convention.

  For example:

  : DW_TAG_pointer_type
  :   DW_AT_type "int"
  :   DW_AT_annotation "btf_type_tag tag1"

  Meaning the tag name is "btf_type_tag" and the tag value is "tag1", using
  the convention that a white character separates them.

  Advantages over 3a:

  - Using a single attribute is more robust, since it eliminates the possible
    situation of a node having =DW_AT_annotation_tag= and not
    =DW_AT_annotation_value=.

  - It is easier to extend it, since the string stored in the
    =DW_AT_annotation= attribute may be made as complex as desired.  Better
    than adding more =DW_AT_annotation_FOO= attributes.

  - This is backwards compatible to what clang currently generates.

  - Easy to implement in GCC.

  Disadvantages over 3a:

  - This requires defining conventions specifying the structure of the string
    stored in the attribute.

  - This has the danger of overzealous design: "let's store a JSON tree in
    =DW_AT_annotation= for future extensions instead of continue bothering
    with DWARF".

  - It cannot handle annotated `void' as currently generated by GCC and
    clang/LLVM, so for tagged =void= we would need to generate unspecified
    types with name "void":

    : DW_TAG_unspecified_type
    :   DW_AT_name "void"
    :   DW_AT_annotation  "btf_type_tag tag1"

    But this should be supported by DWARF consumers, as per the DWARF spec,
    and it is certainly recognized by GDB.