Re: [PATCH bpf-next v4 14/14] arm64: add a comment that warns that orig_x0 should not be moved

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Tue, 8 Feb 2022 14:23:01 -0800

On Tue, Feb 8, 2022 at 1:46 PM Ilya Leoshkevich <iii@xxxxxxxxxxxxx> wrote:
>
> On Tue, 2022-02-08 at 13:11 -0800, Alexei Starovoitov wrote:
> > On Tue, Feb 8, 2022 at 11:46 AM Ilya Leoshkevich <iii@xxxxxxxxxxxxx>
> > wrote:
> > >
> > > On Tue, 2022-02-08 at 11:25 -0800, Alexei Starovoitov wrote:
> > > > On Tue, Feb 08, 2022 at 06:16:35AM +0100, Ilya Leoshkevich wrote:
> > > > > orig_x0's location is used by libbpf tracing macros, therefore
> > > > > it
> > > > > should not be moved.
> > > > >
> > > > > Suggested-by: Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx>
> > > > > Signed-off-by: Ilya Leoshkevich <iii@xxxxxxxxxxxxx>
> > > > > ---
> > > > >  arch/arm64/include/asm/ptrace.h | 4 ++++
> > > > >  1 file changed, 4 insertions(+)
> > > > >
> > > > > diff --git a/arch/arm64/include/asm/ptrace.h
> > > > > b/arch/arm64/include/asm/ptrace.h
> > > > > index 41b332c054ab..7e34c3737839 100644
> > > > > --- a/arch/arm64/include/asm/ptrace.h
> > > > > +++ b/arch/arm64/include/asm/ptrace.h
> > > > > @@ -185,6 +185,10 @@ struct pt_regs {
> > > > >                         u64 pstate;
> > > > >                 };
> > > > >         };
> > > > > +       /*
> > > > > +        * orig_x0 is not exposed via struct user_pt_regs, but
> > > > > its
> > > > > location is
> > > > > +        * assumed by libbpf's tracing macros, so it should not
> > > > > be
> > > > > moved.
> > > > > +        */
> > > >
> > > > In other words this comment is saying that the layout is ABI.
> > > > That's not the case. orig_x0 here and equivalent on s390 can be
> > > > moved.
> > > > It will break bpf progs written without CO-RE and that is
> > > > expected.
> > > > Non CO-RE programs often do all kinds of bpf_probe_read_kernel
> > > > and
> > > > will be breaking when kernel layout is changing.
> > > > I suggest to drop this patch and patch 12.
> > >
> > > Yeah, that was the intention here: to promote orig_x0 to ABI using
> > > a
> > > comment, since doing this by extending user_pt_regs turned out to
> > > be
> > > infeasible. I'm actually ok with not doing this, since programs
> > > compiled with kernel headers and using CO-RE macros will be fine.
> >
> > The comment like this doesn't convert kernel internal struct into
> > ABI.
> > The comment is just wrong. BPF progs access many kernel data structs.
> > s390's and arm64's struct pr_regs is not special in that sense.
> > It's an internal struct.
> >
> > > As you say, we don't care about programs that don't use CO-RE too
> > > much
> > > here - if they break after an incompatible kernel change, fine.
> >
> > Before CO-RE was introduced bpf progs included kernel headers
> > and were breaking when kernel changes. Nothing new here.
> > See the history of bcc tools. Some of them are littered
> > with ifdef VERSION ==.
> >
> > > The question now is - how much do we care about programs that are
> >
> > > compiled with userspace headers? Andrii suggested to use
> > > offsetofend to
> > > make syscall macros work there, however, this now requires this ABI
> > > promotion.
> >
> > Today s390 and arm64 have user_pt_regs as a first field in pt_regs.
> > That is kernel internal behavior and that part can change if arch
> > maintainers have a need for that.
> > bpf progs without CO-RE would have to be adjusted when kernel
> > changes.
> > Even with CO-RE it's ok to rename pt_regs->orig_gpr2 on s390.
> > The progs with CO-RE will break too. The authors of tracing bpf progs
> > have to expect that sooner or later their progs will break and they
> > would have to adjust them.
>
> When it comes to authors of tracing bpf progs, I agree that eventually
> they will have to adjust their code, CO-RE or not. However, in patch 13
> I introduce the following libbpf macro:
>
> #if defined(__KERNEL__) || defined(__VMLINUX_H__)
> ...
> #else
> #define PT_REGS_PARM1_SYSCALL(x) \
>         (*(unsigned long *)(((char *)(x) + \
>                              offsetofend(struct user_pt_regs,
> pstate))))
>
> If we merge this series without freezing orig_x0's offset, in case of
> an incompatible kernel change the users of PT_REGS_PARMn_SYSCALL and
> BPF_KPROBE_SYSCALL, who build against userspace headers, will not
> simply have to update their code - they will have to upgrade libbpf.
> It's also not clear how PT_REGS_PARM1_SYSCALL in the upgraded libbpf
> will even look like, given that it would need to work on both old and
> new kernels.
>
> I've also briefly looked into MIPS' ptrace.h, and it looks as if their
> user_pt_regs has no relationship to kernel pt_regs. IIUC this means
> that it's not possible to correctly implement PT_REGS_PARMn_SYSCALL
> using MIPS userspace headers.
>
> So I wonder whether we should allow using PT_REGS_PARMn_SYSCALL and
> BPF_KPROBE_SYSCALL with userspace headers at all? Would it make sense
> to simply fail the compilation if PT_REGS_PARMn_SYSCALL is used without
> including kernel headers?

Ok, my bad for suggesting those comments, I didn't realize the
consequences of making anything into a stable ABI. Let's not add any
comments, we don't need that.

I think we should just come to terms that for some architectures this
syscall argument access won't work at least for some architectures
without CO-RE. For uniformity let's still have those
PT_REGS_PARM1_SYSCALL macro defined, but we should use
__bpf_unreachable or something like that to make sure it won't compile
if someone tries to use it.

But it's an entirely different story for CORE variants and there (as I
explained on one of the previous patches) we can fabricate our own
definitions of pt_regs (architecture specific, using CO-RE struct
flavors) without any unnecessary assumptions about which include
headers the user is going to use. Hengqi's BPF_KPROBE_SYSCALL() macro
is always going to use CORE variants and will "just work"(tm).

And because this asymmetry of CORE and non-CORE PT_REGS_PARM_SYSCALL
definitions, your changes in v4 are a regression from the ones in v3
which were absolutely fine (I still don't get why you changed all of
that, I've previously landed v3, which means it was 100% acceptable as
is...), because v4 establishes more rigid relation between CORE and
non-CORE variants.

Anyways, let's get back to v3, drop UAPI changes, add struct
pt_regs__s390x and whatever fields we need, use those with
BPF_CORE_READ() and it should be ok with no ABI changes whatsoever.