On Fri, Sep 9, 2022 at 12:25 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Fri, Sep 09, 2022 at 11:32:40AM -0700, Andrii Nakryiko wrote: > > On Fri, Sep 9, 2022 at 7:58 AM Alexei Starovoitov > > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > > > On Fri, Sep 9, 2022 at 7:51 AM Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote: > > > > > > > > On Fri, 9 Sept 2022 at 16:24, Alexei Starovoitov > > > > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > > > > > > > On Fri, Sep 9, 2022 at 4:05 AM Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote: > > > > > > > > > > > > On Fri, 9 Sept 2022 at 10:13, Dave Marchevsky <davemarchevsky@xxxxxx> wrote: > > > > > > > > > > > > > > On 9/4/22 4:41 PM, Kumar Kartikeya Dwivedi wrote: > > > > > > > > Global variables reside in maps accessible using direct_value_addr > > > > > > > > callbacks, so giving each load instruction's rewrite a unique reg->id > > > > > > > > disallows us from holding locks which are global. > > > > > > > > > > > > > > > > This is not great, so refactor the active_spin_lock into two separate > > > > > > > > fields, active_spin_lock_ptr and active_spin_lock_id, which is generic > > > > > > > > enough to allow it for global variables, map lookups, and local kptr > > > > > > > > registers at the same time. > > > > > > > > > > > > > > > > Held vs non-held is indicated by active_spin_lock_ptr, which stores the > > > > > > > > reg->map_ptr or reg->btf pointer of the register used for locking spin > > > > > > > > lock. But the active_spin_lock_id also needs to be compared to ensure > > > > > > > > whether bpf_spin_unlock is for the same register. > > > > > > > > > > > > > > > > Next, pseudo load instructions are not given a unique reg->id, as they > > > > > > > > are doing lookup for the same map value (max_entries is never greater > > > > > > > > than 1). > > > > > > > > > > > > > > > > > > > > > > For libbpf-style "internal maps" - like .bss.private further in this series - > > > > > > > all the SEC(".bss.private") vars are globbed together into one map_value. e.g. > > > > > > > > > > > > > > struct bpf_spin_lock lock1 SEC(".bss.private"); > > > > > > > struct bpf_spin_lock lock2 SEC(".bss.private"); > > > > > > > ... > > > > > > > spin_lock(&lock1); > > > > > > > ... > > > > > > > spin_lock(&lock2); > > > > > > > > > > > > > > will result in same map but different offsets for the direct read (and different > > > > > > > aux->map_off set in resolve_pseudo_ldimm64 for use in check_ld_imm). Seems like > > > > > > > this patch would assign both same (active_spin_lock_ptr, active_spin_lock_id). > > > > > > > > > > > > > > > > > > > That won't be a problem. Two spin locks in a map value or datasec are > > > > > > already rejected on BPF_MAP_CREATE, > > > > > > so there is no bug. See idx >= info_cnt check in > > > > > > btf_find_struct_field, btf_find_datasec_var. > > > > > > > > > > > > I can include offset as the third part of the tuple. The problem then > > > > > > is figuring out which lock protects which bpf_list_head. We need > > > > > > another __guarded_by annotation and force users to use that to > > > > > > eliminate the ambiguity. So for now I just put it in the commit log > > > > > > and left it for the future. > > > > > > > > > > Let's not go that far yet. > > > > > Extra annotations are just as confusing and non-obvious as > > > > > putting locks in different sections. > > > > > Let's keep one lock per map value limitation for now. > > > > > libbpf side needs to allow many non-mappable sections though. > > > > > Single bss.private name is too limiting. > > > > > > > > In that case, > > > > Dave, since the libbpf patch is yours, would you be fine with > > > > reworking it to support multiple private maps? > > > > Maybe it can just ignore the .XXX part in .bss.private.XXX? > > > > Also I think Andrii mentioned once that he wants to eventually merge > > > > data and bss, so it might be a good idea to call it .data.private from > > > > the start? > > > > > > I'd probably make all non-canonical names to be not-mmapable. > > > The compiler generates special sections already. > > > Thankfully the code doesn't use them, but it will sooner or later. > > > So libbpf has to create hidden maps for them eventually. > > > They shouldn't be messed up from user space, since it will screw up > > > compiler generated code. > > > > > > Andrii, what's your take? > > > > Ok, a bunch of things to unpack. We've also discussed a lot of this > > with Dave few weeks ago, but I have also few questions. > > > > First, I'd like to not keep extending ".bss" with any custom ".bss.*" > > sections. This is why we have .data.* and .rodata.* and not .bss (bad, > > meaningless, historic name). > > > > But I'm totally fine dedicating some other prefix to non-mmapable data > > sections that won't be exposed in skeleton and, well, not-mmapable. > > What to name it depends on what we anticipate putting in them? > > > > If it's just for spinlocks, then having something like SEC(".locks") > > seems best to me. If it's for more stuff, like global kptrs, rbtrees > > and whatnot, then we'd need a bit more generic name (.private, or > > whatever, didn't think much on best name). We can also allow .locks.* > > or .private.* (i.e., keep it uniform with .data and .rodata handling, > > expect for mmapable aspect). > > > > One benefit for having SEC(".locks") just for spin_locks is that we > > can teach libbpf to create a multi-element ARRAY map, where each lock > > variable is put into a separate element. From BPF verifier's > > perspective, there will be a single BTF type describing spin lock, but > > multiple "instances" of lock, one per each element. That seems a bit > > magical and I think, generally speaking, it's best to start supporting > > multiple lock declarations within single map element (and thus keep > > track of their offset within map_value); but at least that's an > > option. > > ".lock" won't work. We need lock+rb_root or lock+list_head to be > in the same section. > It should be up to user to name that section with something meaningful. > Ideally something like this should be supported: > SEC("enqueue") struct bpf_spin_lock enqueue_lock; > SEC("enqueue") struct bpf_list_head enqueue_head __contains(foo, node); > SEC("dequeue") struct bpf_spin_lock dequeue_lock; > SEC("dequeue") struct bpf_list_head dequeue_head __contains(foo, node); > > > Dave had some concerns about pinning such maps and whatnot, but for > > starters we decided to not worry about pinning for now. Dave, please > > bring up remaining issues, if you don't mind. > > Pinning shouldn't be an issue. > Only mmap is the problem. User space access if fine since kernel > will mask out special fields on read/write. > > > So to answer Alexei's specific option. I'm still not in favor of just > > saying "anything that's not .data or .rodata is non-mmapable map". I'd > > rather carve out naming prefixes with . (which are reserved for > > libbpf's own use) for these special purpose maps. I don't think that > > limits anyone, right? > > Is backward compat a concern? > Whether to mmap global data is a flag. > It can be opt-in or opt-out. > I'm proposing make all named section to be 'do not mmap'. > If a section needs to be mmaped and appear in skeleton the user can do > SEC("my_section.mmap") > > What you're proposing is to do the other way around: > SEC("enqueue.nommap") > SEC("dequeue.nommap") > in the above example. > I guess it's fine, but more verbose. Well, I didn't propose to use suffixes. Currently user can define SEC(".data.my_custom_use_case"). So I was proposing that we'll just define a different *prefix*, like SEC(".private.enqueue") and SEC(".private.dequeue") for your example above, which will be private to BPF program, not mmap'ed, not exposed in skeleton. mmap is a bit orthogonal to exposing in skeleton, you can still imagine data section that will be allowed to be initialized from user-space before load but never mmaped afterwards. Just to say that .nommap doesn't necessarily imply that it shouldn't be in a skeleton. So I still prefer special prefix (.private) and declare that this is both non-mmapable and not-exposed in skeleton. As for allowing any section. It just feels unnecessary and long-term harmful to allow any section name at this point, tbh. > The gut feeling is that the use case for naming section will be specifically > for lock+rbtree. Everything else will go into common global .data or .rodata. > Same thinking about compiler generated special sections with constants. > They shouldn't be mmaped by default, but we're not going to hack llvm > to add ".nommap" suffix to such sections. > Hence the proposal to avoid mmap by default for all non standard sections.