Hello all. Find below the notes we intend to use in today's BPF office hour to discuss possible solutions for the current limitations in the DWARF representation of the btf_type_tag C attributes, and hopefully decide on one so we can move forward with this. The list of suggested solutions below is of course not closed: these are just the ones we could think about. Better alternatives and suggestions are very welcome! BTF tag support in DWARF * Current situation: annotations as children DIEs for pointees DWARF information is structured as a tree of DIE nodes. Nodes can have attributes associated to them, as well as zero or more DIE children. clang extends DWARF with a new tag (DIE type) =DW_TAG_LLVM_annotation=. Nodes of this type are used to associate a tag name with a tag value that is also a string. Example: : DW_TAG_LLVM_annotation : DW_AT_name "btf_type_tag" : DW_AT_const_value "user" At the moment, clang generates =DW_TAG_LLVM_annotation= nodes as children of =DW_TAG_pointer_type= nodes. The intended semantic is that the annotation applies to the pointed-to type. For example (indentation reflects the parent-children tree structure): : DW_TAG_pointer_type : DW_AT_type "int" : DW_TAG_LLVM_annotation : DW_AT_name "btf_type_tag" : DW_AT_const_value "tag1" The example above associates a "btf_type_tag->tag1" named annotation to the type pointed by its containing pointer_type, which is "int". This approach has the advantage that, since the new =DW_TAG_LLVM_annotation= nodes are effectively used as attributes, they are safely ignored by DWARF consumers that do not understand this DIE type. But this approach also has a big caveat: types that are not pointed-to by pointer types are not expressible in this design. This obviously impacts simple types such as =int= but also pointer types that are not pointees themselves. For example, it is not possible to associate the tag =__tag2= to the type =int **= in this example (Note this is sparse/clang ordering.): : int * __tag1 * __tag2 h; - sparse + __tag1 applies to int*, __tag2 applies to int** : got int *[noderef] __tag1 *[addressable] [noderef] [toplevel] __tag2 h - clang + According to DWARF __tag1 applies to int*, no __tag2 (??). + According to BTF __tag1 applies to int*, no __tag2 (??). : DWARF : 0x00000023: DW_TAG_variable : DW_AT_name ("h") : DW_AT_type (0x0000002e "int **") : : 0x0000002e: DW_TAG_pointer_type : DW_AT_type (0x00000037 "int *") : : 0x00000033: DW_TAG_LLVM_annotation : DW_AT_name ("btf_type_tag") : DW_AT_const_value ("tag1") : BTF : [1] TYPE_TAG 'tag1' type_id=3 : [2] PTR '(anon)' type_id=1 : [3] PTR '(anon)' type_id=4 : [4] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED : [5] VAR 'h' type_id=2, linkage=global : : 'h' -> ptr -> 'tag1' -> ptr -> int * A note about `void' The DWARF specification recommends to denote the =void= C type by generating a DIE with =DW_TAG_unspecified_type= and name "void". However, both GCC and LLVM do _not_ follow this recommendation and instead they denote the =void= type as the absence of a =DW_AT_type= attribute in whatever containing node. Example, for a pointer to =void=: : 3 DW_TAG_pointer_type [no children] Note also that the kernel sources have sparse annotations like: : void __user * data; Which, using sparse ordering, means that the type which is annotated is =void=. Therefore it is very important to be able to tag the =void= basic type in this design. GDB and other DWARF consumers understand the spec-recommended way to denote =void=. * Solution 1: annotations as qualifiers A possible solution for this is to handle =DW_TAG_LLVM_annotation= the same way than C type qualifiers are handled in DWARF: including them in the type chain linked by =DW_AT_type= attributes. For example: : DW_TAG_pointer_type : DW_AT_type ("btf_type_tag") : : DW_TAG_LLVM_annotation : DW_AT_name "btf_type_tag" : DW_AT_const_value "tag1" : DW_AT_type ("int") : : DW_TAG_base_type : DW_AT_name ("int") Note how now the =LLVM_annotation= has the annotated type linked by =DW_AT_type=, and acts itself as a type linked from =DW_TAG_pointer_type=. Advantages of this approach: - It makes sense for annotations to be implemented as qualifiers, because they actually qualify a target type. - This approach is totally flexible and makes it possible to annotate any type, qualified or not, pointed-to or not. - The resulting DWARF looks like the BTF. - It can handle annotated `void', as currently generated by GCC and clang/LLVM: : DW_TAG_LLVM_annotation : DW_AT_name "btf_type_tag" : DW_AT_const_value "tag1" : DW_AT_type NULL Disadvantages of this approach: - Implementing this is more elaborated, and it requires DWARF consumers to understand this new DIE type, in order to follow the type chains in the tree: =DW_TAG_LLVM_annotation= should now be expected in any =DW_AT_type= reference. - This breaks DWARF, making it very difficult to be implemented as a compiler extension, and will likely require make it part of DWARF. - This is not backwards compatible to what clang currently generates. * Solution 2: annotations as children DIEs This approach involves keeping the =DW_TAG_LLVM_annotation= DIE, with the same internal structure it has now, but associating it to the type DIE that is its parent. (Note this is not the same than being linked by a =DW_AT_type= attribute like in Solution 1.) This means that this DWARF tree: : DW_TAG_pointer_type : DW_AT_type "int" : DW_TAG_LLVM_annotation : DW_AT_name "btf_type_tag" : DW_AT_const_value "tag1" Denotes an annotation that applies to the type =int*=, not the pointee type =int=. Advantages of this approach: - This approach makes it possible to annotate any type, qualified or not, pointed-to or not. - This can easily be implemented as a compiler extension, because existing DWARF consumers will happily ignore the new attributes in case they don't support them; the type chains in the tree remain the same. - Easy to implement in GCC. Disadvantages of this approach: - This may result in an increased number of type nodes in the tree. For example, we may have a tagged =int*= and a non-tagged =int*=, which now will have to be implemented using two different DIEs. - This is not backwards-compatible to what clang currently generates, in the case of pointer types. - It cannot handle annotated `void' as currently generated by GCC and clang/LLVM, so for tagged =void= we would need to generate unspecified types with name "void": : DW_TAG_unspecified_type : DW_AT_name "void" : DW_TAG_LLVM_annotation : DW_AT_name "btf_type_tag" : DW_AT_const_value "tag1" But this should be supported by DWARF consumers, as per the DWARF spec, and it is certainly recognized by GDB. * Solution 3a: annotations as set of attributes Another possible solution is to extend DWARF with a pair of two new attributes =DW_AT_annotation_tag= and =DW_AT_annotation_value=. Annotated types will have these attributes defined. Example: : DW_TAG_pointer_type : DW_AT_type "int" : DW_AT_annotation_tag "btf_type_tag" : DW_AT_annotation_value "tag1" Note that in this example the tag applies to the pointer type, not the pointee, i.e. to =int*=. Advantages of this approach: - This can easily be implemented as a compiler extension, because existing DWARF consumers will happily ignore the new attributes in case they don't support them; the type chains in the tree remain the same. - This is backwards compatible to what clang currently generates. - Easy to implement in GCC. Disadvantages of this approach: - This may result in an increased number of type nodes in the tree. For example, we may have a tagged =int*= and a non-tagged =int*=, which now will have to be implemented using two different DIEs. - It cannot handle annotated `void' as currently generated by GCC and clang/LLVM, so for tagged =void= we would need to generate unspecified types with name "void": : DW_TAG_unspecified_type : DW_AT_name "void" : DW_AT_annotation_tag "btf_type_tag" : DW_AT_annotation_value "tag1" But this should be supported by DWARF consumers, as per the DWARF spec, and it is certainly recognized by GDB. * Solution 3b: annotations as single "structured" attributes This is like 3a, but using a single attribute =DW_AT_annotation= instead of two, and encoding the tag name and the tag value in the string value using some convention. For example: : DW_TAG_pointer_type : DW_AT_type "int" : DW_AT_annotation "btf_type_tag tag1" Meaning the tag name is "btf_type_tag" and the tag value is "tag1", using the convention that a white character separates them. Advantages over 3a: - Using a single attribute is more robust, since it eliminates the possible situation of a node having =DW_AT_annotation_tag= and not =DW_AT_annotation_value=. - It is easier to extend it, since the string stored in the =DW_AT_annotation= attribute may be made as complex as desired. Better than adding more =DW_AT_annotation_FOO= attributes. - This is backwards compatible to what clang currently generates. - Easy to implement in GCC. Disadvantages over 3a: - This requires defining conventions specifying the structure of the string stored in the attribute. - This has the danger of overzealous design: "let's store a JSON tree in =DW_AT_annotation= for future extensions instead of continue bothering with DWARF". - It cannot handle annotated `void' as currently generated by GCC and clang/LLVM, so for tagged =void= we would need to generate unspecified types with name "void": : DW_TAG_unspecified_type : DW_AT_name "void" : DW_AT_annotation "btf_type_tag tag1" But this should be supported by DWARF consumers, as per the DWARF spec, and it is certainly recognized by GDB.