Re: [PATCH RFC bpf-next v1 21/32] bpf: Allow locking bpf_spin_lock global variables

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Sun, 11 Sep 2022 15:31:32 -0700

On Fri, Sep 09, 2022 at 05:21:52PM -0700, Andrii Nakryiko wrote:
> > >
> > > Well, I didn't propose to use suffixes. Currently user can define
> > > SEC(".data.my_custom_use_case").
> >
> > ... and libbpf will mmap such maps and will expose them in skeleton.
> > My point that it's an existing bug.
> 
> hm... it's not a bug, it's a desired feature. I wanted
> 
> int my_var SEC(".data.mine");
> 
> to be just like .data but in a separate map. So no bug here.

".rodata.*" and ".data.*" section names are effectively reserved by the compiler.
Sooner or later there will be trouble if users start mixing compiler
sections with their own section names like ".data.mine".
In bpf backend we specicialy check for '.rodata' prefix and avoid emitting BTF.
llvm can emit .rodata.cst%d, .rodata.str%d.%d, .data.rel, etc
Not sure how many such special sections will be generated once
bpf progs will get bigger, but creating them as maps will waste
plenty of kernel memory due to page align in bpf-array.
llvm should probably combine them when possible and minimize section
usage in general, but that's orthogonal.
Mixing user and compiler sections under the same prefix is just asking for trouble.

> > Compiler generated .rodata.str1.1 sections should not be messed by
> > user space. There is no BTF for them either.
> 
> Shouldn't but could if they wanted to without skeleton as well. In
> generated skeleton there will be an empty struct for this and no field
> for each of compiler's constant. User has to intentionally do
> something to harm themselves, which we can never stop either way.
> 
> So stuff like .rodata.str1.1 exposes a bit of compiler implementation
> details, but overall idea of allowing custom .data.xxx and .rodata.xxx
> sections was to make them mmapable and readable/writable through
> skeleton.

It's not a good idea to expose compiler internals into skeleton
and even worse to ask users to operate in the compiler's namespace.

> Carving out some sub-namespace based on special suffix feels wrong.

agree. suffix doesn't work, since prefix is already owned by the compiler.

> > mmap and subsequent write by user space won't cause a crash for bpf prog,
> > but it won't be doing what C code intended.
> > There is nothing in there for skeleton and user space to see,
> > but such map should be created, populated and map_fd provided to the prog to use.
> >
> > > So I was proposing that we'll just
> > > define a different *prefix*, like SEC(".private.enqueue") and
> > > SEC(".private.dequeue") for your example above, which will be private
> > > to BPF program, not mmap'ed, not exposed in skeleton.
> > >
> > > mmap is a bit orthogonal to exposing in skeleton, you can still
> > > imagine data section that will be allowed to be initialized from
> > > user-space before load but never mmaped afterwards. Just to say that
> > > .nommap doesn't necessarily imply that it shouldn't be in a skeleton.
> >
> > Well. That's true for normal skeleton and for lskel,
> > but not the case for kernel skel that doesn't have mmap.
> > Exposing a map in skel implies that it will be accessed not only
> > after _open and before _load, but after load as well.
> > We can say that mmap != expose in skeleton, but I really don't see
> > such feature being useful.
> 
> It's basically what .rodata is, actually. It's something that's
> settable from user-space before load and that's it. Yes, you can read
> it after load, but no one does it in practice. But we are digressing,
> I understand you want to make this short and sweet and I agree with
> you. I just disagree about wildcard rule for any non-dotted ELF
> section or using special suffix. See below.
> 
> >
> > > So I still prefer special prefix (.private) and declare that this is
> > > both non-mmapable and not-exposed in skeleton.
> > >
> > > As for allowing any section. It just feels unnecessary and long-term
> > > harmful to allow any section name at this point, tbh.
> >
> > Fine. How about a single new character instead of '.private' prefix ?
> > Like SEC("#enqueue") that would mean no-skel and no-mmap ?
> >
> > Or double dot SEC("..enqueue") ?
> >
> > '.private' is too verbose and when it's read in the context of C file
> > looks out of place and confusing.
> 
> As I said, I gave zero thought to .private, I just took it from
> ".bss.private". I'd like to keep it "dotted", so SEC("#something") is
> very "unusual". Me not like.

Why is this unusual? We have SEC("?tc") already.
SEC("#foo") is very similar.
Dot prefix is special. Something compiler will generate
whereas the section that starts with [A-z] is user's.
So reserving a prefix that starts from [A-z] would be wrong.

> For double-dot, could be just SEC("..data") and generalized to
> SEC("..data.<custom>")? BTW, we can add a macro, similar to __kconfig,
> to hide more descriptive and longer name. E.g.,
> 
> struct bpf_spin_lock my_lock __internal;

Macro doesn't help, since the namespace is broken anyway.
'..data' is dangerous because something might be doing strstr(".data")
instead of strcmp(".data") and will match that section erroneously.

> __internal, __private, __secret, don't know, naming is hard.

Right. We already use special meaning for text section names: tc, xdp
which is also far from ideal, but that's too late to change.
For data I'm arguing that only '.[ro]data' should appear in skel
and compiler internals '.[ro]data.*' should not leak to users.
Then emit all of [A-z] starting sections, since that's what compilers
and linkers will do, but reserve a single character like '#'
or whatever other char to mean that this section shouldn't be mmapable.