Re: BPF logging infrastructure. Was: [PATCH bpf-next 4/6] tools: add new members to bpf_attr.raw_tracepoint in bpf.h

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 15, 2020 at 4:41 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Wed, Jul 15, 2020 at 02:56:36PM +0200, Toke Høiland-Jørgensen wrote:
> > Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> writes:
> >
> > > On Wed, Jul 15, 2020 at 12:19:03AM +0200, Toke Høiland-Jørgensen wrote:
> > >> Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> writes:
> > >>
> > >> >> However, assuming it *is* possible, my larger point was that we
> > >> >> shouldn't add just a 'logging struct', but rather a 'common options
> > >> >> struct' which can be extended further as needed. And if it is *not*
> > >> >> possible to add new arguments to a syscall like you're proposing, my
> > >> >> suggestion above would be a different way to achieve basically the same
> > >> >> (at the cost of having to specify the maximum reserved space in advance).
> > >> >>
> > >> >
> > >> > yeah-yeah, I agree, it's less a "logging attr", more of "common attr
> > >> > across all commands".
> > >>
> > >> Right, great. I think we are broadly in agreement with where we want to
> > >> go with this, actually :)
> > >
> > > I really don't like 'common attr across all commands'.
> > > Both of you are talking as libbpf developers who occasionally need to
> > > add printk-s to the kernel. That is not an excuse to bloat api that will be
> > > useful to two people.
> >
> > What? No, this is about making error messages comprehensible to people
> > who *can't* just go around adding printks. "Guess the source of the
> > EINVAL" is a really bad user experience!
> >
> > > The only reason log_buf sort-of make sense in raw_tp_open is because
> > > btf comparison is moved from prog_load into raw_tp_open.
> > > Miscompare of (prog_fd1, btf_id1) vs (prog_fd2, btf_id2) can be easily solved
> > > by libbpf with as nice and as human friendly message libbpf can do.
> >
> > So userspace is supposed to replicate all the checks done by the kernel
> > because we can't be bothered to add proper error messages? Really?
>
> That's not what I said. The kernel can report unique errno for miscompare
> and all nice messages can and _should be_ be printed by libbpf.
>
>
> On Wed, Jul 15, 2020, Andrii Nakryiko wrote:
> >
> > Inability to figure out what's wrong when using BPF is at the top of
> > complaints from many users, together with hard to understand logs from
> > verifier.
>
> Only the second part is true. All users are complaining about the verifier.
> No one is complaing that failed prog attach is somehow lacking string message.

Ok, next time I'll be helping someone to figure out another -EINVAL,
I'll remember to reassure them that it's not really frustrating, not a
guess game, and not a time sink at all.

> The users are also complaing about libbpf being too verbose.

Very well might be, but apart from your complaints on that patch
adding program loading debug message, I can't remember a single case
when someone complained about that. Do you have a link for me to get
some context?

> Yet you've refused to address the verbosity where it should be reduced and

It's open-source, everyone is welcome to submit their patches. Just
because I don't think we need to remove some log messages and thus not
am creating such patches, doesn't mean it can't be done by someone
else.

> now refusing to add it where it's needed.

Can you point to or quote where I refused to add a helpful message to libbpf?

> It's libbpf job to explain users kernel errors.

To the best of its ability, yes. Unfortunately there were many times
where I, as a human, couldn't figure it out without printk'ing my way
around the kernel. If I can't do that, I can't teach libbpf to do it.
Error codes are just not granular enough to allow distinguishing a lot
of error conditions, either by humans or automatically by libbpf.

>
> The same thing is happening with perf_event_open syscall.
> Every one who's trying to code it directly complaining about the kernel. But
> not a single user is complaing about perf syscall when they use libraries and
> tools. Same thing with bpf syscall. libbpf is the interface. It needs to clear
> and to the point. Right now it's not doing it well. elf dump is too verbose and
> unnecessary whereas in other places it says nothing informative where it
> could have printed human hint.
>
> libbpf's pr_perm_msg() hint is the only one where libbpf cares about its users.
> All other messages are useful to libbpf developers and not its users.

"Couldn't load trivial BPF program. Make sure your kernel supports BPF
(CONFIG_BPF_SYSCALL=y) and/or that RLIMIT_MEMLOCK is set to big enough
value."
"kernel doesn't support global data"
"can't attach BPF program w/o FD (did you load it?)"
"specified path %s is not on BPF FS"
"vmlinux BTF is not found" -- we should clearly add "you need a kernel
built with CONFIG_DEBUG_INFO_BTF=y"
"invalid relo for \'%s\' in special section 0x%x; forgot to initialize
global var?.."

And so on. Sure, we can have more and better error messages, no one is
claiming libbpf is perfect or done.

That doesn't mean, though, that the kernel itself can't do better in
terms of error reporting. But you clearly don't think it's a real
problem, so I'll let it rest, thank you.

>
> The kernel verifier messages suck as well. They need to be improved.
> But this thread 'lets add strings everywhere and users will be happy' is
> completely missing the mark.




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux