BTF tag support in DWARF (notes for today's BPF Office Hours)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello all.

Find below the notes we intend to use in today's BPF office hour to
discuss possible solutions for the current limitations in the DWARF
representation of the btf_type_tag C attributes, and hopefully decide on
one so we can move forward with this.

The list of suggested solutions below is of course not closed: these are
just the ones we could think about.  Better alternatives and suggestions
are very welcome!

BTF tag support in DWARF

* Current situation: annotations as children DIEs for pointees

  DWARF information is structured as a tree of DIE nodes.  Nodes can
  have attributes associated to them, as well as zero or more DIE
  children.
   
  clang extends DWARF with a new tag (DIE type) =DW_TAG_LLVM_annotation=.
  Nodes of this type are used to associate a tag name with a tag value that
  is also a string.

  Example:

  :  DW_TAG_LLVM_annotation
  :     DW_AT_name        "btf_type_tag"
  :     DW_AT_const_value "user"

  At the moment, clang generates =DW_TAG_LLVM_annotation= nodes as children
  of =DW_TAG_pointer_type= nodes.  The intended semantic is that the
  annotation applies to the pointed-to type.

  For example (indentation reflects the parent-children tree structure):

  : DW_TAG_pointer_type
  :   DW_AT_type "int"
  :   DW_TAG_LLVM_annotation
  :     DW_AT_name        "btf_type_tag"
  :     DW_AT_const_value "tag1"

  The example above associates a "btf_type_tag->tag1" named annotation to the
  type pointed by its containing pointer_type, which is "int".

  This approach has the advantage that, since the new
  =DW_TAG_LLVM_annotation= nodes are effectively used as attributes, they are
  safely ignored by DWARF consumers that do not understand this DIE type.

  But this approach also has a big caveat: types that are not pointed-to by
  pointer types are not expressible in this design.  This obviously impacts
  simple types such as =int= but also pointer types that are not pointees
  themselves.

  For example, it is not possible to associate the tag =__tag2= to the type
  =int **= in this example (Note this is sparse/clang ordering.):

  : int * __tag1 * __tag2 h;

  - sparse
    +  __tag1 applies to int*, __tag2 applies to int**
    : got int *[noderef] __tag1 *[addressable] [noderef] [toplevel] __tag2 h
  - clang
    + According to DWARF __tag1 applies to int*, no __tag2 (??).
    + According to BTF  __tag1 applies to int*, no __tag2 (??).
    : DWARF
    : 0x00000023:   DW_TAG_variable
    :                 DW_AT_name	("h")
    :                 DW_AT_type	(0x0000002e "int **")
    :
    : 0x0000002e:   DW_TAG_pointer_type
    :                 DW_AT_type	(0x00000037 "int *")
    :
    : 0x00000033:     DW_TAG_LLVM_annotation
    :                 DW_AT_name	("btf_type_tag")
    :                 DW_AT_const_value	("tag1")
    : BTF
    : [1] TYPE_TAG 'tag1' type_id=3
    : [2] PTR '(anon)' type_id=1
    : [3] PTR '(anon)' type_id=4
    : [4] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
    : [5] VAR 'h' type_id=2, linkage=global
    :
    : 'h' -> ptr -> 'tag1' -> ptr -> int

* A note about `void'

  The DWARF specification recommends to denote the =void= C type by
  generating a DIE with =DW_TAG_unspecified_type= and name "void".

  However, both GCC and LLVM do _not_ follow this recommendation and instead
  they denote the =void= type as the absence of a =DW_AT_type= attribute in
  whatever containing node.

  Example, for a pointer to =void=:

  : 3      DW_TAG_pointer_type    [no children]

  Note also that the kernel sources have sparse annotations like:

  : void __user * data;

  Which, using sparse ordering, means that the type which is annotated is
  =void=.  Therefore it is very important to be able to tag the =void= basic
  type in this design.

  GDB and other DWARF consumers understand the spec-recommended way to denote
  =void=.

* Solution 1: annotations as qualifiers

  A possible solution for this is to handle =DW_TAG_LLVM_annotation= the same
  way than C type qualifiers are handled in DWARF: including them in the type
  chain linked by =DW_AT_type= attributes.

  For example:

  : DW_TAG_pointer_type
  :   DW_AT_type ("btf_type_tag")
  :
  : DW_TAG_LLVM_annotation
  :   DW_AT_name        "btf_type_tag"
  :   DW_AT_const_value "tag1"
  :   DW_AT_type        ("int")
  :
  : DW_TAG_base_type
  :   DW_AT_name ("int")

  Note how now the =LLVM_annotation= has the annotated type linked by
  =DW_AT_type=, and acts itself as a type linked from =DW_TAG_pointer_type=.

  Advantages of this approach:

  - It makes sense for annotations to be implemented as qualifiers, because
    they actually qualify a target type.

  - This approach is totally flexible and makes it possible to annotate any
    type, qualified or not, pointed-to or not.

  - The resulting DWARF looks like the BTF.

  - It can handle annotated `void', as currently generated by GCC and
    clang/LLVM:

    :   DW_TAG_LLVM_annotation
    :     DW_AT_name        "btf_type_tag"
    :     DW_AT_const_value "tag1"
    :     DW_AT_type NULL

  Disadvantages of this approach:

  - Implementing this is more elaborated, and it requires DWARF consumers to
    understand this new DIE type, in order to follow the type chains in the
    tree: =DW_TAG_LLVM_annotation= should now be expected in any =DW_AT_type=
    reference.

  - This breaks DWARF, making it very difficult to be implemented as a
    compiler extension, and will likely require make it part of DWARF.

  - This is not backwards compatible to what clang currently generates.

* Solution 2: annotations as children DIEs

  This approach involves keeping the =DW_TAG_LLVM_annotation= DIE, with the
  same internal structure it has now, but associating it to the type DIE that
  is its parent.  (Note this is not the same than being linked by a
  =DW_AT_type= attribute like in Solution 1.)

  This means that this DWARF tree:

  : DW_TAG_pointer_type
  :   DW_AT_type "int"
  :   DW_TAG_LLVM_annotation
  :     DW_AT_name        "btf_type_tag"
  :     DW_AT_const_value "tag1"

  Denotes an annotation that applies to the type =int*=, not the pointee type
  =int=.

  Advantages of this approach:

  - This approach makes it possible to annotate any type, qualified or not,
    pointed-to or not.

  - This can easily be implemented as a compiler extension, because existing
    DWARF consumers will happily ignore the new attributes in case they don't
    support them;  the type chains in the tree remain the same.

  - Easy to implement in GCC.

  Disadvantages of this approach:

  - This may result in an increased number of type nodes in the tree.  For
    example, we may have a tagged =int*= and a non-tagged =int*=, which now
    will have to be implemented using two different DIEs.
   
  - This is not backwards-compatible to what clang currently generates, in
    the case of pointer types.

  - It cannot handle annotated `void' as currently generated by GCC and
    clang/LLVM, so for tagged =void= we would need to generate unspecified
    types with name "void":

    : DW_TAG_unspecified_type
    :   DW_AT_name "void"
    :   DW_TAG_LLVM_annotation
    :     DW_AT_name        "btf_type_tag"
    :     DW_AT_const_value "tag1"

    But this should be supported by DWARF consumers, as per the DWARF spec,
    and it is certainly recognized by GDB.

* Solution 3a: annotations as set of attributes

  Another possible solution is to extend DWARF with a pair of two new
  attributes =DW_AT_annotation_tag= and =DW_AT_annotation_value=.

  Annotated types will have these attributes defined.  Example:

  : DW_TAG_pointer_type
  :   DW_AT_type "int"
  :   DW_AT_annotation_tag   "btf_type_tag"
  :   DW_AT_annotation_value "tag1"

  Note that in this example the tag applies to the pointer type, not the
  pointee, i.e. to =int*=.

  Advantages of this approach:

  - This can easily be implemented as a compiler extension, because existing
    DWARF consumers will happily ignore the new attributes in case they don't
    support them;  the type chains in the tree remain the same.

  - This is backwards compatible to what clang currently generates.

  - Easy to implement in GCC.
   
  Disadvantages of this approach:

  - This may result in an increased number of type nodes in the tree.  For
    example, we may have a tagged =int*= and a non-tagged =int*=, which now
    will have to be implemented using two different DIEs.

  - It cannot handle annotated `void' as currently generated by GCC and
    clang/LLVM, so for tagged =void= we would need to generate unspecified
    types with name "void":

    : DW_TAG_unspecified_type
    :   DW_AT_name "void"
    :   DW_AT_annotation_tag   "btf_type_tag"
    :   DW_AT_annotation_value "tag1"

    But this should be supported by DWARF consumers, as per the DWARF spec,
    and it is certainly recognized by GDB.
   
* Solution 3b: annotations as single "structured" attributes

  This is like 3a, but using a single attribute =DW_AT_annotation= instead of
  two, and encoding the tag name and the tag value in the string value using
  some convention.

  For example:

  : DW_TAG_pointer_type
  :   DW_AT_type "int"
  :   DW_AT_annotation "btf_type_tag tag1"

  Meaning the tag name is "btf_type_tag" and the tag value is "tag1", using
  the convention that a white character separates them.

  Advantages over 3a:

  - Using a single attribute is more robust, since it eliminates the possible
    situation of a node having =DW_AT_annotation_tag= and not
    =DW_AT_annotation_value=.

  - It is easier to extend it, since the string stored in the
    =DW_AT_annotation= attribute may be made as complex as desired.  Better
    than adding more =DW_AT_annotation_FOO= attributes.

  - This is backwards compatible to what clang currently generates.

  - Easy to implement in GCC.
   
  Disadvantages over 3a:

  - This requires defining conventions specifying the structure of the string
    stored in the attribute.

  - This has the danger of overzealous design: "let's store a JSON tree in
    =DW_AT_annotation= for future extensions instead of continue bothering
    with DWARF".

  - It cannot handle annotated `void' as currently generated by GCC and
    clang/LLVM, so for tagged =void= we would need to generate unspecified
    types with name "void":

    : DW_TAG_unspecified_type
    :   DW_AT_name "void"
    :   DW_AT_annotation  "btf_type_tag tag1"

    But this should be supported by DWARF consumers, as per the DWARF spec,
    and it is certainly recognized by GDB.



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux