Re: Kernel 5.5.4 build fail for BPF-selftests with latest LLVM

Jesper Dangaard Brouer <brouer@xxxxxxxxxx> · Thu, 20 Feb 2020 17:37:40 +0100

On Wed, 19 Feb 2020 17:47:23 -0700
shuah <shuah@xxxxxxxxxx> wrote:

> On 2/19/20 5:27 PM, Alexei Starovoitov wrote:
> > On Wed, Feb 19, 2020 at 03:59:41PM -0600, Daniel Díaz wrote:  
> >>>
> >>> When I download a specific kernel release, how can I know what LLVM
> >>> git-hash or version I need (to use BPF-selftests)?  
> > 
> > as discussed we're going to add documentation-like file that will
> > list required commits in tools.
> > This will be enforced for future llvm/pahole commits.
> >   
> >>> Do you think it is reasonable to require end-users to compile their own
> >>> bleeding edge version of LLVM, to use BPF-selftests?  
> > 
> > absolutely.  
> 
> + linux-kselftest@xxxxxxxxxxxxxxx
> 
> End-users in this context are users and not necessarily developers.

I agree.  And I worry that we are making it increasingly hard for
non-developer users.

> > If a developer wants to send a patch they must run all selftests and
> > all of them must pass in their environment.
> > "but I'm adding a tracing feature and don't care about networking tests
> > failing"... is not acceptable.  
> 
> This is a reasonable expectation when a developers sends bpf patches.

Sure. I have several versions on LLVM that I've compiled manually.

> >   
> >>> I do hope that some end-users of BPF-selftests will be CI-systems.
> >>> That also implies that CI-system maintainers need to constantly do
> >>> "latest built from sources" of LLVM git-tree to keep up.  Is that a
> >>> reasonable requirement when buying a CI-system in the cloud?  
> > 
> > "buying CI-system in the cloud" ?
> > If I could buy such system I would pay for it out of my own pocket to save
> > maintainer's and developer's time.

And Daniel Díaz want to provide his help below (to tests it on arch
that you likely don't even have access to). That sounds like a good
offer, and you don't even have to pay.

> >   
> >> We [1] are end users of kselftests and many other test suites [2]. We
> >> run all of our testing on every git-push on linux-stable-rc, mainline,
> >> and linux-next -- approximately 1 million tests per week. We have a
> >> dedicated engineering team looking after this CI infrastructure and
> >> test results, and as such, I can wholeheartedly echo Jesper's
> >> sentiment here: We would really like to help kernel maintainers and
> >> developers by automatically testing their code in real hardware, but
> >> the BPF kselftests are difficult to work with from a CI perspective.
> >> We have caught and reported [3] many [4] build [5] failures [6] in the
> >> past for libbpf/Perf, but building is just one of the pieces. We are
> >> unable to run the entire BPF kselftests because only a part of the
> >> code builds, so our testing is very limited there.
> >>
> >> We hope that this situation can be improved and that our and everyone
> >> else's automated testing can help you guys too. For this to work out,
> >> we need some help.  
> >   
> 
> It would be helpful understand what "help" is in this context.
> 
> > I don't understand what kind of help you need. Just install the
> > latest tools.  

I admire that you want to push *everybody* forward to use the latest
LLVM, but saying latest is LLVM devel git tree HEAD is too extreme.
I can support saying latest LLVM release is required.

As soon as your LLVM patches are accepted into llvm-git-tree, you will
add some BPF selftests that util this. Then CI-systems pull latest
bpf-next they will start to fail to compile BPF-selftests, and CI
stops.  Now you want to force CI-system maintainer to recompile LLVM
from git.  This will likely take some time.  Until that happens
CI-system doesn't catch stuff. E.g. I really want the ARM tests that
Linaro can run for us (which isn't run before you apply patches...).

> What would be helpful is to write bpf tests such that older tests that
> worked on older llvm versions continue to work and with some indication
> on which tests require new bleeding edge tools.
> 
> > Both the latest llvm and the latest pahole are required.  
> 
> It would be helpful if you can elaborate why latest tools are a
> requirement.
> 
> > If by 'help' you mean to tweak selftests to skip tests then it's a nack.
> > We have human driven CI. Every developer must run selftests/bpf before
> > emailing the patches. Myself and Daniel run them as well before applying.
> > These manual runs is the only thing that keeps bpf tree going.
> > If selftests get to skip tests humans will miss those errors.
> > When I don't see '0 SKIPPED, 0 FAILED' I go and investigate.
> > Anything but zero is a path to broken kernels.
> > 
> > Imagine the tests would get skipped when pahole is too old.
> > That would mean all of the kernel features from year 2019
> > would get skipped. Is there a point of running such selftests?
> > I think the value is not just zero. The value is negative.
> > Such selftests that run old stuff would give false believe
> > that they do something meaningful.
> > "but CI can do build only tests"... If 'helping' such CI means hurting the
> > key developer/maintainer workflow such CI is on its own.
> >   
> 
> Skipping tests will be useless. I am with you on that. However,
> figuring out how to maintain some level of backward compatibility
> to run at least older tests and warn users to upgrade would be
> helpful.

What I propose is that a BPF-selftest that use a new LLVM feature,
should return FAIL (or perhaps SKIP), when it is compiled with say one
release old LLVM. This will allow new-tests to show up in CI-systems
reports as FAIL, and give everybody breathing room to upgrade their LLVM
compiler.

> I suspect currently users are ignoring bpf failures because they
> are unable to keep up with the requirement to install newer tools
> to run the tests. This isn't great either.

Yes, my worry is also that we are simply making it too difficult for
non-developer users to run these tests.  And I specifically want to
attract CI-systems to run these.  And especially Linaro, who have
dedicated engineering team looking after their CI infrastructure, and
they explicitly in this email confirm my worry.

> Users that care are sharing their pain to see if they can get some
> help or explanation on why new tools are required every so often.
> I don't think everybody understands why. :)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer