Re: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking

Guenter Roeck <linux@xxxxxxxxxxxx> · Wed, 5 Jul 2017 08:16:33 -0700

On 07/05/2017 07:06 AM, Greg KH wrote:
On Wed, Jul 05, 2017 at 09:27:57AM -0400, Steven Rostedt wrote:
Your "b" above is what I would like to push. But who's going to enforce
this? With 10,000 changes per release, and a lot of them are fixes, the
best we can do is the honor system. Start shaming people that don't
have a regression test along with a Fixes tag (but we don't want people
to fix bugs without adding that tag either). There is a fine line one
must walk between getting people to change their approaches to bugs and
regression tests, and pissing them off where they start doing the
opposite of what would be best for the community.

I would bet, for the huge majority of our fixes, they are fixes for
specific hardware, or workarounds for specific hardware issues.  Now
writing tests for those is not an impossible task (look at what the i915
developers have), but it is very very hard overall, especially if the
base infrastructure isn't there to do it.

For specific examples, here's the shortlog for fixes that went into
drivers/usb/host/ for 4.12 after 4.12-rc1 came out.  Do you know of a
way to write a test for these types of things?
	usb: xhci: ASMedia ASM1042A chipset need shorts TX quirk
	usb: xhci: Fix USB 3.1 supported protocol parsing
	usb: host: xhci-plat: propagate return value of platform_get_irq()
	xhci: Fix command ring stop regression in 4.11
	xhci: remove GFP_DMA flag from allocation
	USB: xhci: fix lock-inversion problem
	usb: host: xhci-ring: don't need to clear interrupt pending for MSI enabled hcd
	usb: host: xhci-mem: allocate zeroed Scratchpad Buffer
	xhci: apply PME_STUCK_QUIRK and MISSING_CAS quirk for Denverton
	usb: xhci: trace URB before giving it back instead of after
	USB: host: xhci: use max-port define
	USB: ehci-platform: fix companion-device leak
	usb: r8a66597-hcd: select a different endpoint on timeout
	usb: r8a66597-hcd: decrease timeout

And look at the commits with the "Fixes:" tag in it, I do, I read every
one of them.  See if writing a test for the majority of them would even
be possible...

I don't mean to poo-poo the idea, but please realize that around 75% of
the kernel is hardware/arch support, so that means that 75% of the
changes/fixes deal with hardware things (yes, change is in direct
correlation to size of the codebase in the tree, strange but true).

The reproducers for several of the usb fixes I submitted recently took hours of
stress test to reproduce the underlying problems. I have one more to fix which
takes days to reproduce, if at all (I have seen that problem only two or three
times during weeks of stress test). Due to the nature of the problems, reproducing
them heavily depended on the underlying hardware. None of the reproducers can
guarantee that the problem is fixed; they are intended to show the problem,
not that it is fixed. This happens a lot with race conditions - in many cases
it is impossible to prove that the problem is fixed; one can only prove that
it still exists.

Echoing what you said, I have no idea how it would even be possible to write
unit tests to verify if the problems I fixed are really fixed.

Several of the fixes I have submitted are based on single-instance error logs with
no reproducer. Many others are compile time fixes or fix problems found with code
inspection (manual or automatic).

If we start shaming people for not providing unit tests, all we'll accomplish is
that people will stop providing bug fixes.

Guenter
--
To unsubscribe from this list: send the line "unsubscribe linux-api" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html