Re: kselftest build broken?

Dmitry Vyukov <dvyukov@xxxxxxxxxx> · Wed, 12 Jun 2019 20:29:44 +0200

On Wed, Jun 12, 2019 at 6:45 PM shuah <shuah@xxxxxxxxxx> wrote:
> Hi Dmitry,
>
> This is the 6th email from you in a span of 3 hours! I am just going to
> respond this last one. Please try to summarize your questions instead of
> sending email storm, so it will be easier to parse and more productive
> for both of us.

Hi Shuah,

Sorry for that. Let me combine all current questions in a more structured way.

My motivation: I am trying to understand what does it take to run/add
kernel tests in particular for the purpose of providing working
instructions to run kernel test to a new team member or a new external
kernel developer, and if it's feasible to ask a kernel developer
fixing a bug to add a regression test and ensure that it works. Note
in these cases a user may not have lots of specific expertise (e.g.
any unsaid/implicit thing may be a showstopper) and/or don't have
infinite motivation/time (may give up given a single excuse to do so)
and/or don't have specific interest/expertise in the tested subsystem
(e.g. a drive-by
fix). So now I am trying to follow this route myself, documenting steps.

1. You suggested to install a bunch of packages. That helped to some
degree. Is there a way to figure out what packages one needs to
install to build the tests other than asking you?

2. Build of bpf tests was broken after installing all required
packages. It helped to delete some random files
(tools/testing/selftests/bpf/{feature,FEATURE-DUMP.libbpf}). Is it
something to fix in kselftests? Deleting random files was a chaotic
action which I can't explain to anybody.

3. I am still getting 1 build error:

  CC       /usr/local/google/home/dvyukov/src/linux/tools/testing/selftests/bpf/str_error.o
timestamping.c:249:19: error: ‘SIOCGSTAMP’ undeclared (first use in
this function); did you mean ‘SIOCGSTAMPNS’?

What should I do to fix this?

4. Are individual test errors are supposed to be fatal? Or I can just
ignore a single error and proceed?
I've tried to proceed, but I am not sure if I will get some
unexplainable errors later because of that. By default I would assume
that any errors during make are fatal.

5. The instructions on running tests:

  $ make -C tools/testing/selftests run_tests
  $ make kselftest

Do they assume that the tests will run right on my host machine? It's
not stated/explained anywhere, but I don't see how "make kselftest"
can use my usual setup because it don't know about it.
I cannot run tests on the host. Policy rules aside, this is yet
untested kernel, so by installing it I am risking losing my whole
machine.
Reading further, "Install selftests" and "Running installed selftests"
sections seem to be a way to run tests on another machine. Is it
correct? Are there any other options? There seems to be a bunch of
implicit unsaid things, so I am asking in case I am missing some even
simpler way to run tests.
Or otherwise, what is the purpose of "installing" tests?

6. The "Running installed selftests" section says:
"Kselftest install as well as the Kselftest tarball provide a script
named "run_kselftest.sh" to run the tests".

What is the "Kselftest tarball"? Where does one get one? I don't see
any mentions of "tarball" anywhere else in the doc.

7. What image am I supposed to use to run kselftests? Say, my goal is
either running as many tests as possible (CI scenario), or tests for a
specific subsystem (a drive-by fix scenario).
All images that I have do not seem to be suitable. One is failing with:
./run_kselftest.sh: 2: ./run_kselftest.sh: realpath: not found
And there is no clear path to fix this. After I guessed the right
package to install, it turned out to be broken in the distro.
In another image all C programs fail to run with:
./test_maps: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.26'

How is one supposed to get an image suitable for running kselftests?

8. Lots of tests fail/skipped with some cryptic for me errors like:

# Cannot find device "ip6gre11"

# selftests: [SKIP] Could not run test without the ip xdpgeneric support

# modprobe: ERROR: ../libkmod/libkmod.c:586 kmod_search_moddep() could
not open moddep file '/lib/modules/5.1.0+/modules.dep.bin'

# selftests: bpf: test_tc_edt.sh
# nc is not available
not ok 40 selftests: bpf: test_tc_edt.sh

Say, I either want to run tests for a specific subsystem because I am
doing a drive-by fix (a typical newcomer/good Samaritan scenario), or
I want to run as many tests as possible (a typical CI scenario). Is
there a way to bulk satisfy all these prerequisite (configs, binaries
and whatever they are asking for)?

9. There is a test somewhere in the middle that consistently reboots my machine:

# selftests: breakpoints: step_after_suspend_test
[  514.024889] PM: suspend entry (deep)
[  514.025959] PM: Syncing filesystems ... done.
[  514.051573] Freezing user space processes ... (elapsed 0.001 seconds) done.
[  514.054140] OOM killer disabled.
[  514.054764] Freezing remaining freezable tasks ... (elapsed 0.001
seconds) done.
[  514.057695] printk: Suspending console(s) (use no_console_suspend to debug)
early console in extract_kernel
...

Is it a bug in the test? in the kernel? Or how is this supposed to
work/what am I supposed to do with this?

10. Do you know if anybody is running kselftests? Running as in
running continuously, noticing new failures, reporting these failures,
keeping them green, etc.
I am asking because one of the tests triggers a use-after-free and I
checked it was the same 3+ months ago. And I have some vague memories
of trying to run kselftests 3 or so years ago, and there was a bunch
of use-after-free's as well.

11. Do we know what's the current code coverage achieved by kselftests?
What's covered? What's not? Overall percent/per-subsystem/etc?

12. I am asking about the aggregate result, because that's usually the
first thing anybody needs (both devs testing a change and a CI). You
said that kselftest does not keep track of the aggregate result. So
the intended usage is always storing all output to a file and then
grepping it for "[SKIP]" and "[FAIL]". Is it correct?

Thanks in advance for bearing with me.