On 12/11/2019 23:18, Alexei Starovoitov wrote: > On Tue, Nov 12, 2019 at 09:25:06PM +0000, Edward Cree wrote: >> Fwiw the 'natural' C way of doing it would be that for any extern symbol in >> the C file, the ELF file gets a symbol entry with st_shndx=SHN_UNDEF, and >> code in .text that uses that symbol gets relocation entries. That's (AIUI) >> how it works on 'normal' architectures, and that's what my ebld linker >> understands; when it sees a definition in another file for that symbol >> (matched just by the symbol name) it applies all the relocations of the >> symbol to the appropriate progbits. >> I don't really see what else you could define 'extern' to mean. > That's exactly the problem with standard 'extern'. ELF preserves the name only. > There is no type. But if you have BTFs, then you can look up each symbol's type at link time and check that they match. Point being that the BTF is for validation (and therefore strictly optional at this point) rather than using it to identify a symbol. Trouble with the latter is that BTF ids get renumbered on linking, so any references to them have to change, whereas symbol names stay the same. > There is also > no way to place extern into a section. Currently SEC("..") is a standard way to > annotate bpf programs. While the symbol itself doesn't have a section, each _use_ of the symbol has a reloc, and the SHT_REL[A] in which that reloc resides has a sh_info specifying "the section header index of the section to which the relocation applies." So can't that be used if symbol visibility needs to depend on section? Tbh I can't exactly see why externs need placing in a section in the first place. > I think reliable 'extern' has to have more than just > name. 'extern int foo;' can be a reference to 'int foo;' in another BPF ELF > file, or it can be a reference to 'int foo;' in already loaded BPF prog, or it > can be a reference to 'int foo;' inside the kernel itself, or it can be a > reference to pseudo variable that libbpf should replace. For example 'extern > int kernel_version;' or 'extern int CONFIG_HZ;' would be useful extern-like > variables that program might want to use. Disambiguating by name is probably > not enough. We can define an order of resolution. libbpf will search in other > .o first, then will search in loaded bpf progs, than in kernel, and if all > fails than will resolve things like 'extern int CONFIG_HZ' on its own. It feels > fragile though. It sounds perfectly reasonable and not fragile to me. The main alternative I see, about equally good, is to not allow defining symbols that are already (non-weakly) defined; so if a bpf prog tries to globally declare "int CONFIG_HZ" or "int netif_receive_skb(struct sk_buff *skb)" then it gets rejected. > I think we need to be able to specify something like section to > extern variables and functions. It seems unnecessary to have the user code specify this. Another a bad analogy: in userland C code you don't have to annotate the function protos in your header files to say whether they come from another .o file, a random library or the libc. You just declare "a function called this exists somewhere and we'll find it at link time". > I was imagining that the verifier will do per-function verification > of program with sub-programs instead of analyzing from root. Ah I see. Yes, that's a very attractive design. If we make it from a sufficiently generic idea of pre/postconditions, then it could also be useful for e.g. loop bodies (user-supplied annotations that allow us to walk the body only once instead of N times); then a function call just gets standard pre/postconditions generated from its argument types if the user didn't specify something else. That would then also support things like: > The next step is to extend this thought process to integers. > int foo(struct xdp_md *arg1, int arg2); > The verifier can check that the program is valid for any valid arg1 and > arg2 = mark_reg_unbounded(). ... this but arg2 isn't unbounded. However, it might be difficult to do this without exposing details of the verifier into the ABI. Still, if we can it sounds like it would make John quite happy too. And of course it doesn't need to have the user annotations from the beginning, it can start out as just the kernel generating pre/post conditions internally. -Ed