LKFT CI: improving Networking selftests results when validating stable kernels

Matthieu Baerts <matttbe@xxxxxxxxxx> · Fri, 8 Nov 2024 19:21:59 +0100

Hello LKFT maintainers, CI operators,

First, I would like to say thank you to the people behind the LKFT
project for validating stable kernels (and more), and including some
Network selftests in their tests suites.

A lot of improvements around the networking kselftests have been done
this year. At the last Netconf [1], we discussed how these tests were
validated on stable kernels from CIs like the LKFT one, and we have some
suggestions to improve the situation.

KSelftests from the same version
--------------------------------

According to the doc [2], kselftests should support all previous kernel
versions. The LKFT CI is then using the kselftests from the last stable
release to validate all stable versions. Even if there are good reasons
to do that, we would like to ask for an opt-out for this policy for the
networking tests: this is hard to maintain with the increased
complexity, hard to validate on all stable kernels before applying
patches, and hard to put in place in some situations. As a result, many
tests are failing on older kernels, and it looks like it is a lot of
work to support older kernels, and to maintain this.

Many networking tests are validating the internal behaviour that is not
exposed to the userspace. A typical example: some tests look at the raw
packets being exchanged during a test, and this behaviour can change
without modifying how the userspace is interacting with the kernel. The
kernel could expose capabilities, but that's not something that seems
natural to put in place for internal behaviours that are not exposed to
end users. Maybe workarounds could be used, e.g. looking at kernel
symbols, etc. Nut that doesn't always work, increase the complexity, and
often "false positive" issue will be noticed only after a patch hits
stable, and will cause a bunch of tests to be ignored.

Regarding fixes, ideally they will come with a new or modified test that
can also be backported. So the coverage can continue to grow in stable
versions too.

Do you think that from the kernel v6.12 (or before?), the LKFT CI could
run the networking kselftests from the version that is being validated,
and not from a newer one? So validating the selftests from v6.12.1 on a
v6.12.1, and not the ones from a future v6.16.y on a v6.12.42.

Skipped tests
-------------

It looks like many tests are skipped:

- Some have been in a skip file [3] for a while: maybe they can be removed?

- Some are skipped because of missing tools: maybe they can be added?
e.g. iputils, tshark, ipv6toolkit, etc.

- Some tests are in 'net', but in subdirectories, and hence not tested,
e.g. forwarding, packetdrill, netfilter, tcp_ao. Could they be tested too?

How can we change this to increase the code coverage using existing tests?

KVM
---

It looks like different VMs are being used to execute the different
tests. Do these VMs benefit from any accelerations like KVM? If not,
some tests might fail because the environment is too slow.

The KSFT_MACHINE_SLOW=yes env var can be set to increase some
tolerances, timeout or to skip some parts, but that might not be enough
for some tests.

Notifications
-------------

In case of new regressions, who is being notified? Are the people from
the MAINTAINERS file, and linked to the corresponding selftests being
notified or do they need to do the monitoring on their side?

Looking forward to improving the networking selftests results when
validating stable kernels!

[1] https://netdev.bots.linux.dev/netconf/2024/
[2] https://docs.kernel.org/dev-tools/kselftest.html
[3]
https://github.com/Linaro/test-definitions/blob/master/automated/linux/kselftest/skipfile-lkft.yaml

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.