Of the two problems discussed: 1. DW_TAG_LLVM_annotation not being able to denote annotations to non-pointed based types. clang currently ignores these instances. We discussed two possible options to deal with this: 1.1 To continue ignoring these cases in the front-end, keep the dwarf expressiveness limitation, and document it. 1.2 To change DW_TAG_LLVM_annotation so it behaves like a qualifier DIE (like const, volatile, etc.) so it can apply to any type. 2. The ordering problem: sparse annotations order differently than GNU/C99 compiler attributes. Therefore translating 1-to-1 from sparse annotations to compiler attributes results in attributes with different syntax than normal compiler attributes. This was accepted in clang. But found resistance in GCC when we sent the first patch series. During the meeting we went thru several possible ways of dealing with this problem, but we didn't reach any conclusion on what to do, since the time ran out. We agreed to continue the discussion at the BPF office hours next 5 January 2023. In the meanwhile, below in this email is a slightly updated version of the material used to go thru the topics during the discussion. If there is any mistake or if you see that our understanding of the problem/situation is not correct, please point it out. If you want to add more information, please do so by replying to this thread. Finally, it was agreed that we (GCC BPF hackers) would send Yonghong our github accounts so he can subscribe us to notifications in the llvm phabricator, so we can be aware of potentially ABI/breaking changes at the time they are discussed, and not afterwards scanning bpf@vger. I alredy sent him the information. Thank you for your time today. It is appreciated. ---- * Background ** Attribute/annotation ordering in declarations - By "ordering" in this context we understand the correspondence of annotations (like compiler attributes) with language entities. Given: : char * __attribute ((btf_type_tag("foo"))) * buf; What is the language entity (type) to which the attribute applies? Is it char? Is it char*? Is it char**? - This is decided by compilers at parse-time and reflected in the structure of the AST. - Therefore this is a source-level concept, and it is not amenable by generating debug info with a different ordering. ** Compilers implement different orderings for different kinds of attributes - The sparse compiler implements a particular ordering for sparse annotations, which look like: : __attribute__ ((address_space(user))) The ordering is not documented, but can be determined by running sparse and/or by looking at the sparse parser source code. [The fact sparse annotations look like compiler GNU-like attributes must not mislead us: they are not the same and they don't order the same way.] - The GCC and clang compilers implement another particular ordering for GNU-like compiler attributes, which have the form: : __attribute__ ((noinline)) The ordering is specified in the "Attribute Syntax" chapter in the GCC manual. - The GCC and clang compilers implement another particular ordering for C2x compiler attributes, which have the form: : [[noinline]] The ordering is specified in the C2x specification. All three entities above, sparse annotations, GNU-like compiler attributes and C2x compiler attributes use different orderings, i.e. the correspondence of the annotation/attribute with the language entity it annotates (type, declaration, ...) varies. * The problem ** Re-use of sparse annotations as compiler attributes - At some point we got the request from the BPF community to support a couple of new compiler attributes in GCC: : btf_type_tag ("foobar") : btf_decl_tag ("foobar") These attributes translate into annotations in both BTF and DWARF debugging information. clang/llvm had already been modified to support these attributes, the new BTF kind, and a new DW_TAG_LLVM_annotation DWARF DIE type. - David Faust wrote an implementation, and used as a criteria the several examples he saw in the kernel sources and what clang generates. He added the same support in BTF and a compatible DW_TAG_GNU_annotation for DWARF. He obviously implemented the new compiler attributes as what they are: GNU-like compiler attributes, and saw that they were being ordered differently to what the kernel sources would expect. - Then we investigated and learned that the intention at the BPF side was to re-use the already existing address_space sparse annotations in the kernel sources as btf_*_tag compiler attributes. - At present, it looks like support for several other sparse-like attributes have been added to clang (which ones?) and as a result now clang effectively has GNU-like compiler attributes that order differently than all other attributes. - At LPC 2022 we met David Malcolm, who has been working in implementing sparse annotations and warnings natively in GCC. He found similar problems with the ordering of the attributes. ** Scope of the problem - This happens when sparse annotations are re-used as compiler attributes. - sparse annotations are NOT compiler attributes => there is NO bug in sparse. - These annotations are used very widely in the Linux kernel sources. - Compilers involved: sparse, GCC and clang - sparse annotations can appear in both BPF C programs and kernel source files. ** Ordering discrepancies between sparse annotations and compiler attributes Using these definitions: : #ifdef __CHECKER__ : #define __tag1 __attribute__((noderef, address_space(__tag1))) : #define __tag2 __attribute__((noderef, address_space(__tag2))) : #else : #define __tag1 __attribute__((btf_type_tag("tag1"))) : #define __tag2 __attribute__((btf_type_tag("tag2"))) : #endif __CHECKER__ is defined by sparse, so the former definitions are used * while compiling the examples with sparse. The latter definitions with btf_type_tag are used when compiling with GCC or clang. Compilers compared: - sparse + Ordering information is obtained via sparse debug printed as part of a warning. - GCC + Slightly outdated 'master' branch, with the patches to support btf_type_tag and btf_decl_tag applied: https://gcc.gnu.org/git/gcc.git:refs/users/dfaust/heads/btf-type-tag-new-rebase + Ordering information is obtained via the generated BTF and DWARF. Note that we have verified that the BTF and DWARF results are accurately representative of the GCC internal compiler structures which produce it, and that therefore they reflect how the compiler has parsed the declarations. - clang + clang version 16.0.0. main branch of December 14 2022. + Ordering information is obtained via the generated BTF and DWARF. Probably the simplest case involving pointers, shows how both compilers are mangling the generated BTF in order to reflect the sparse expectations (presumably the same hack is in pahole when it does the DWARF -> BTF translation): : char __tag1 * buf; - sparse + __tag1 applies to type char. : got char [noderef] __tag1 *[addressable] [toplevel] buf - clang + According to DWARF __tag1 applies to type char + According to BTF __tag1 applies to type char : DWARF : 0x00000023: DW_TAG_variable : DW_AT_name ("buf") : DW_AT_type (0x0000002e "char *") : : 0x0000002e: DW_TAG_pointer_type : DW_AT_type (0x00000037 "char") : : 0x00000033: DW_TAG_LLVM_annotation : DW_AT_name ("btf_type_tag") : DW_AT_const_value ("tag1") : BTF : [1] TYPE_TAG 'tag1' type_id=3 : [2] PTR '(anon)' type_id=1 : [3] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=SIGNED : [4] VAR 'buf' type_id=2, linkage=global : : buf -> ptr -> 'tag1' -> char - GCC + According to DWARF __tag1 applies to type char + According to BTF __tag1 applies to type char : DWARF : 0x0000002e: DW_TAG_variable : DW_AT_name ("buf") : DW_AT_type (0x00000044 "char *") : : 0x00000044: DW_TAG_pointer_type : DW_AT_type (0x00000057 "char") : : 0x0000004d: DW_TAG_LLVM_annotation : DW_AT_name ("btf_type_tag") : DW_AT_const_value ("tag1") : : BTF : [1] INT 'char' size=1 bits_offset=0 nr_bits=8 encoding=SIGNED : [2] PTR '(anon)' type_id=3 : [3] TYPE_TAG 'tag1' type_id=1 : [4] VAR 'buf' type_id=2, linkage=global : : buf -> ptr -> 'tag1' -> char The next example is from from bpf-next/arch/x86/kernel/kgdb.c:184 and shows cases where both the DWARF and the BTF are different: : struct perf_event * __percpu *pev; - sparse + __percpu applies to struct perf_event* : got struct perf_event *[noderef] __percpu *[addressable] [toplevel] pev - clang + According to DWARF __percpu applies to struct perf_event* + According to BTF __percpu applies to struct perf_event* : DWARF : 0x00000023: DW_TAG_variable : DW_AT_name ("pev") : DW_AT_type (0x0000002e "perf_event **") : : 0x0000002e: DW_TAG_pointer_type : DW_AT_type (0x00000037 "perf_event *") : : 0x00000033: DW_TAG_LLVM_annotation : DW_AT_name ("btf_type_tag") : DW_AT_const_value ("percpu") : : 0x00000037: DW_TAG_pointer_type : DW_AT_type (0x0000003c "perf_event") : BTF : [2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED : [4] PTR '(anon)' type_id=5 : [5] STRUCT 'perf_event' size=4 vlen=1 : 'a' type_id=2 bits_offset=0 : [8] TYPE_TAG 'percpu' type_id=4 : [9] PTR '(anon)' type_id=8 : [13] VAR 'pev' type_id=9, linkage=global : : pev -> ptr -> 'percpu' -> ptr -> struct perf_event - GCC + According to DWARF __percpu applies to struct perf_event + According to BTF __percpu applies to struct perf_event : DWARF : 0x00000023: DW_TAG_variable : DW_AT_name ("pev") : DW_AT_type (0x00000039 "perf_event **") : : 0x00000039: DW_TAG_pointer_type (perf_event **) : DW_AT_type (0x0000003f "perf_event *") : : 0x0000003f: DW_TAG_pointer_type (perf_event *) : DW_AT_type (0x0000001e "perf_event") : : 0x00000045: DW_TAG_LLVM_annotation : DW_AT_name ("btf_type_tag") : DW_AT_const_value ("percpu") : BTF : [1] STRUCT 'perf_event' size=4 vlen=1 : 'a' type_id=2 bits_offset=0 : [2] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED : [3] PTR '(anon)' type_id=4 : [4] TYPE_TAG 'percpu' type_id=1 : [5] PTR '(anon)' type_id=3 : [14] VAR 'pev' type_id=5, linkage=global : : pev -> ptr -> ptr -> 'percpu' -> perf_event The following case shows two different attributes applied to pointers-to-pointers: : int __tag1 * __tag2 * g; - sparse + __tag1 applies to int, __tag2 applies to int* : got int [noderef] __tag1 *[noderef] __tag2 *[addressable] [toplevel] g - clang + According to DWARF __tag1 applies to int, __tag2 applies to int* + According to BTF __tag1 applies to int, __tag2 applies to int* : DWARF : 0x00000023: DW_TAG_variable : DW_AT_name ("g") : DW_AT_type (0x0000002e "int **") : : 0x0000002e: DW_TAG_pointer_type : DW_AT_type (0x00000037 "int *") : : 0x00000033: DW_TAG_LLVM_annotation : DW_AT_name ("btf_type_tag") : DW_AT_const_value ("tag2") : : 0x00000037: DW_TAG_pointer_type : DW_AT_type (0x00000040 "int") : : 0x0000003c: DW_TAG_LLVM_annotation : DW_AT_name ("btf_type_tag") : DW_AT_const_value ("tag1") : BTF : [1] TYPE_TAG 'tag2' type_id=4 : [2] PTR '(anon)' type_id=1 : [3] TYPE_TAG 'tag1' type_id=5 : [4] PTR '(anon)' type_id=3 : [5] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED : [6] VAR 'g' type_id=2, linkage=global : : 'g' -> ptr -> 'tag2' -> ptr -> 'tag1' -> int - GCC + According to DWARF __tag1 applies to int*, __tag2 applies to int + According to BTF __tag1 applies to int*, __tag2 applies to int : DWARF : 0x0000002e: DW_TAG_variable : DW_AT_name ("g") : DW_AT_type (0x00000042 "int **") : : 0x00000042: DW_TAG_pointer_type : DW_AT_type (0x00000055 "int *") : : 0x0000004b: DW_TAG_LLVM_annotation : DW_AT_name ("btf_type_tag") : DW_AT_const_value ("tag1") : : 0x00000055: DW_TAG_pointer_type : DW_AT_type (0x00000068 "int") : : 0x0000005e: DW_TAG_LLVM_annotation : DW_AT_name ("btf_type_tag") : DW_AT_const_value ("tag2") : BTF : [1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED : [2] PTR '(anon)' type_id=3 : [3] TYPE_TAG 'tag2' type_id=1 : [4] PTR '(anon)' type_id=5 : [5] TYPE_TAG 'tag1' type_id=2 : [6] VAR 'g' type_id=4, linkage=global : : 'g' -> ptr -> 'tag1' -> ptr -> 'tag2' -> int What follows is a typical kernel example. We get "right" results in GCC just because both tags are the same attributes, but effectively we hit the same problem than in the previous example: : int __tag1 * __tag1 * val; - sparse + __tag1 applies to both int and int* : got int [noderef] __tag1 *[noderef] __tag1 *[addressable] [toplevel] val - clang + According to DWARF __tag1 applies to both int and int* + According to BTF __tag1 applies to both int and int* : DWARF : 0x0000001e: DW_TAG_variable : DW_AT_name ("val") : DW_AT_type (0x00000029 "int **") : : 0x00000029: DW_TAG_pointer_type : DW_AT_type (0x00000032 "int *") : : 0x0000002e: DW_TAG_LLVM_annotation : DW_AT_name ("btf_type_tag") : DW_AT_const_value ("tag1") : : 0x00000032: DW_TAG_pointer_type : DW_AT_type (0x0000003b "int") : : 0x00000037: DW_TAG_LLVM_annotation : DW_AT_name ("btf_type_tag") : DW_AT_const_value ("tag1") : BTF : [1] TYPE_TAG 'tag1' type_id=4 : [2] PTR '(anon)' type_id=1 : [3] TYPE_TAG 'tag1' type_id=5 : [4] PTR '(anon)' type_id=3 : [5] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED : [6] VAR 'val' type_id=2, linkage=global : : 'val' -> ptr -> 'tag1' -> ptr -> 'tag1' -> int - GCC + According to DWARF __tag1 applies to both int and int* + According to BTF __tag1 applies to both int and int* : DWARF : 0x0000001e: DW_TAG_variable : DW_AT_name ("val") : DW_AT_type (0x00000034 "int **") : : 0x00000034: DW_TAG_pointer_type : DW_AT_type (0x00000047 "int *") : : 0x0000003d: DW_TAG_LLVM_annotation : DW_AT_name ("btf_type_tag") : DW_AT_const_value ("tag1") : : 0x00000047: DW_TAG_pointer_type : DW_AT_type (0x0000005a "int") : : 0x00000050: DW_TAG_LLVM_annotation : DW_AT_name ("btf_type_tag") : DW_AT_const_value ("tag1") : BTF : [1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED : [2] PTR '(anon)' type_id=3 : [3] TYPE_TAG 'tag1' type_id=1 : [4] PTR '(anon)' type_id=5 : [5] TYPE_TAG 'tag1' type_id=2 : [6] VAR 'val' type_id=4, linkage=global : : 'val' -> ptr -> 'tag1' -> ptr -> 'tag1' -> int This last example is another variation of pointer-to-pointer with different tags and different positions: : int * __tag1 * __tag2 h; - sparse + __tag1 applies to int*, __tag2 applies to int** : got int *[noderef] __tag1 *[addressable] [noderef] [toplevel] __tag2 h - clang + According to DWARF __tag1 applies to int*, no __tag2 (??). + According to BTF __tag1 applies to int*, no __tag2 (??). : DWARF : 0x00000023: DW_TAG_variable : DW_AT_name ("h") : DW_AT_type (0x0000002e "int **") : : 0x0000002e: DW_TAG_pointer_type : DW_AT_type (0x00000037 "int *") : : 0x00000033: DW_TAG_LLVM_annotation : DW_AT_name ("btf_type_tag") : DW_AT_const_value ("tag1") : BTF : [1] TYPE_TAG 'tag1' type_id=3 : [2] PTR '(anon)' type_id=1 : [3] PTR '(anon)' type_id=4 : [4] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED : [5] VAR 'h' type_id=2, linkage=global : : 'h' -> ptr -> 'tag1' -> ptr -> int - GCC + According to DWARF __tag1 applies to int*, __tag2 applies to int** + According to BTF __tag1 applies to int, __tag2 applies to int* : DWARF : 0x0000002e: DW_TAG_variable : DW_AT_name ("h") : DW_AT_type (0x00000042 "int **") : : 0x00000042: DW_TAG_pointer_type : DW_AT_type (0x00000055 "int *") : : 0x0000004b: DW_TAG_LLVM_annotation : DW_AT_name ("btf_type_tag") : DW_AT_const_value ("tag2") : : 0x00000055: DW_TAG_pointer_type : DW_AT_type (0x00000068 "int") : : 0x0000005e: DW_TAG_LLVM_annotation : DW_AT_name ("btf_type_tag") : DW_AT_const_value ("tag1") : BTF : [1] INT 'int' size=4 bits_offset=0 nr_bits=32 encoding=SIGNED : [2] PTR '(anon)' type_id=3 : [3] TYPE_TAG 'tag1' type_id=1 : [4] PTR '(anon)' type_id=5 : [5] TYPE_TAG 'tag2' type_id=2 : [6] VAR 'h' type_id=4, linkage=global : : 'h' -> ptr -> 'tag2' -> ptr -> 'tag1' -> int * Solutions ** Reorder sparse annotations in kernel source [Julia Jawall kindly adapted coccinelle so it can handle this and the following possible solution involving modifying kernel sources.] This would basically involve writing and applying a coccinelle script to reorder something like this: : int __tag1 * __tag2 * g; into this : int __tag2 * __tag1 * g; where : #define __tag1 __attribute__((address_space(__tag1))) This totally breaks sparse so it is hardly a realistic solution. ** Add compiler attributes corresponding to existing sparse annotations to the kernel source This would involve writing and applying a coccinelle script to add compiler attributes corresponding to the existing sparse annotations. So from this: : int __tag1 * __tag2 * g; We would go to: : int __tag1 __comp_tag2 * __tag2 __comp_tag1 * g; where : #if __CHECKER__ : # define __tag1 __attribute__((address_space(__tag1))) : #else : # define __tag1 : #endif : #define __comp_tag1 __attribute__((btf_type_tag("tag1"))) This would not break sparse, but would lead to a lot of (non trivial) redundancy in the kernel sources: both sparse annotations and compiler attributes will denote the same thing. ** Adding native support for sparse annotations to GCC and clang This would make both GCC and clang to handle sparse annotations using the sparse ordering. This is what David Malcolm has been working on. There is some previous work From Tom Tromey to support the following sparse annotations: address_space, noderef and force. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59850 (But AFAIK Tom's patch also falls in the mistake of ordering the attributes using GNU-like attribute ordering, not the sparse ordering?) For example, in this C declaration: : int * __attribute__((address_space(__tag1))) * p; The compilers would recognize that as a sparse annotation (and not a GNU-like compiler attribute) and parse it in a way it applies to int* instead of int**. At least two problems with this approach: 1. This could be confusing for users, and difficult to get applied upstream, because sparse annotations look like GNU-like compiler attributes, despite not being the same thing (ordering differently.) 2. sparse lacks support for btf_type_tag and btf_decl_tag, and these compiler attributes are more general than the existing sparse annotations. ** Adding GNU-like compiler attributes to GCC and clang with sparse ordering This is what clang does right now (or tries to do). Solves problem 2. above, but not 1. Maybe problem 1. could be alleviated by oddly-ordering compiler attributes to have a distinguished name, such as using a 'sparse' prefix: : __attribute__((sparse_btf_type_tag ("tag1"))) ** Other solutions?