Le 20/02/2024 à 12:35, Thorsten Leemhuis a écrit :
[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]
Hi, Thorsten here, the Linux kernel's regression tracker.
Hi, thanks for replying (even if I find your tone a bit harsh, but I
don't blame you - and since English is not my native language, maybe I'm
mistaking).
[1]
https://github.com/torvalds/linux/commit/46a0a2c96f0f47628190f122c2e3d879e590bcbe
[2]
https://github.com/torvalds/linux/commit/2f2bd7cbd1d1548137b351040dc4e037d18cdfdc
[3]
https://github.com/torvalds/linux/commit/43527a0094c10dfbf0d5a2e7979395a38de3ff65
The regression is that a middle click is performed when releasing middle
button after wheel emulation.
How did you identify these three commits? Or do you just suspect that
it's one of them?
No, I didn't "just suspect" that it was one of them. I may not be a
kernel developer but I'm an experienced sysadmin (25+ years). So please
stop taking users for idiots.
First, I compared the three machines I used which have a keyboard with a
TrackPoint: my desktop at home (external "Lenovo ThinkPad Compact
Keyboard with TrackPoint" (not II, not Bluetooth), Debian unstable (I'm
a DM), my desktop at work (same keyboard, Debian Stable) and my personal
laptop (ThinkPad X270, internal keyboard, Debian Stable but with backports).
The machine at work had a 5.10 kernel at the time, and the other ones
had a 6.6, but only the machines with an external keyboard exhibited the
spurious middle-clicks. So I compared the loaded HID drivers, and
noticed that both of them had hid_lenovo loaded, whereas the laptop did not.
Confident that I probably pinpointed the faulty driver, I simply looked
at the file history on Github, and saw that those three commits were
dated from after the time when the bug appeared ; moreover, the comments
did mention stuff related to wheel emulation and spurious middle-clicks.
So, no, I didn't "just suspected" that they were responsible, but I hope
you'll admit my method was sound, and that my conclusion is a pretty
strong (to not say "almost certain") probability.
And did you try to check which of the three is the actual culprit?
Either by reverting them on top of master or by checking the parent for
each of the commits (git show '2f2bd7cbd1d^' shows the parent for
2f2bd7cbd1d).
I admit I didn't. I didn't compile my own kernels for ages. I used to do
it in the past, but I came to trust Debian's kernels and rely on the
maintainers' work. But read below.
On Debian Stable, the last working kernel was 5.10.127, the regression
appeared in 5.10.136 (i read all changelogs on kernel.org between those
two releases but couldn't find anything about hid-lenovo, so I can't
tell exactly in which release the regression appeared, Debian upgraded
directly from .127 to .136).
Why not bisect between .127 and .136 then?
I heard of that term before (and I understand the mathematical meaning
of it), but I never did it with a Git tree. I read the guide you
mentioned below, but it seems much too complicated and too long to me
for just verifying if those three commits are indeed the cause of the
regression (which I'm almost sure of, as stated above).
So in the meantime, I decided to follow my hunch and recompile only the
hid_lenovo module (following the guide at [6], updating it slightly by
manually removing kernel signing options in .config, since I obviously
don't have Debian's signing keys, and replacing "make
SUBDIRS=drivers/..." with "make M=..." as suggested by make), after
un-applying those three patches in reverse order.
[6] https://askubuntu.com/a/338403/387067
The HID modules built successfully, and after copying my modified
hid-lenovo.ko to /usr/lib/modules/6.6.15-amd64/updates/ and running
'depmod -a', the module loaded fine with Debian's kernel (I don't use
Secure Boot on this machine).
I'll let a few days pass (remember, the bug doesn't happen immediately
but only after a varying amount of time) and I'll report here if the
spurious middle-clicks happened again or not.
Notes:
1/ Thank you for (indirectly) giving me this idea. Maybe this relatively
simple procedure should be made available somewhere on Debian's wiki
(instead of an outdated, but still useful, answer on AskUbuntu).
2/ Please note that I did it only for unstable kernel; unfortunately, I
can't do the same for the stable kernel, since I don't have access to my
machine at work anymore (my freelance contract ended one week ago) and
I don't have any other machine at home exhibiting this bug. So I won't
be able to test it on a stable kernel.
I reported it in Debian [4], and apparently I'm not the only person
suffering from it [5].
[4] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1058758#32
[5] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1058758#42
I would understand that such bugs would end up in a development kernel
like the ones provided by Debian Unstable, but not with stable kernels
like the ones provided by Debian Stable.
A bug report like yours can do the trick sometimes, as it might be
enough to ring a bell for one of the developers. But given that nobody
replied yet it looks like that is not the case. Then you most likely
will need to perform a bisection to identify the exact commit that broke
things.
Nobody amongst the developers, yes, I'll give you that. But the comment
I linked from the Debian BTS, plus another bug report I found in the
Input mailing list [7], show that I'm not the only user complaining from
the recent regressions.
[7]
https://lore.kernel.org/linux-input/CACSVgagaEHO2zoYQ8zDBrMT9OvT8R5B_h3dxfZuLQFAUBtKMmQ@xxxxxxxxxxxxxx
Regards,
--
Raphaël Halimi