On 27/03/2024 17:31, Hans de Goede wrote:
Hi Philip,
On 3/19/24 5:52 AM, Philip Müller wrote:
On 18/03/2024 17:58, Philip Müller wrote:
I'm currently developing on the OrangePi Neo-01, which ships with two similar touchpads using the Synaptics driver. On 6.7.10 those two devices get detected normally. On 6.8.1 it seems to error out.
I either get none, or at best only one of those two devices.
i2c_hid_acpi: probe of i2c-XXX0001:00 failed with error -110
i2c_hid_acpi: probe of i2c-XXX0002:00 failed with error -110
what would be the best way to debug this?
I found the regression in commit aa69d6974185e9f7a552ba982540a38e34f69690
HID: i2c-hid: Switch i2c_hid_parse() to goto style error handling
I just checked that patch and I don't see anyway how that can create
this regression. I assume you did a git bisect ?
Did you try the last commit in the tree before that commit got added
and verified that that one works where as building a kernel from commit
aa69d6974185e9f itself does not work ?
No I didn't do a git bisect actually. I did it more manually like git
bisect might do it. What I did was to compile just that module and load
it against the 6.7.10 kernel. a9f68ffe1170ca4bc17ab29067d806a354a026e0
is the last commit you find in 6.7 series. All the other commits are
part of the 6.8 series or later. So I looked at the commit titles and
picked a9f68ffe1170ca4bc17ab29067d806a354a026e0 which I compiled the
"out of tree" module. I unloaded the 6.7.10 module, copied the new
compiled one over and did the rmod, modprobe batch script to see how the
touchpads react.
Then I went to compile and tested
af93a167eda90192563bce64c4bb989836afac1f which created the issue with
-110 errors. So I went to aa69d6974185e9f7a552ba982540a38e34f69690 which
also results in the error. So I went back to
96d3098db835d58649b73a5788898bd7672a319b which still works. Since now
6.9 development started, I may try
00aab7dcb2267f2aef59447602f34501efe1a07f to see if that fixes anything.
I can get the touchpads both working in 6.8 code when I do enough tries
by a rmod/modprobe loop. Based on the logs you see, sometimes both are
up, mostly one or none is up. having only two commits of 6.7 code in the
module it works as 6.7 code. Only when "HID: i2c-hid: Switch
i2c_hid_parse() to goto style error handling" gets added I see the error
and can reproduce it on my device as shown in the log.
Before we had
hid_err(hid, "reading report descriptor failed\n");
kfree(rdesc);
followed by the hardcoded return -EIO;
Now we have an redirect to the out function but don't do kfree(rdesc) if
you hit the error anymore.
Also the last return 0; got removed. So maybe what ever error the device
might have hit before might be reverted due to return 0 or not freeing
rdesc when hitting the error might prevent it to load.
Anyhow, that patch is the first patch which changes the function in that
series. Sure it might not be able to compile the whole kernel with it. I
only tested the module based on the given code in an out of tree module
style.
So I might test 6.9rc1 on the device and let you know if that kernel has
the same issue still.
When I use the commit before I can rmmod and modprobe in a batch script using a loop without erroring out to -110. Attached the testing script and dmesg log snippets
#!/bin/bash
for ((n=0;n<5;n++))
do
sudo rmmod i2c_hid_acpi
sleep 1
sudo modprobe i2c_hid_acpi --force-vermagic
sleep 2
done
Ok, so you did try the commit before and that did work. Are you
sure that aa69d6974185e9f was not actually the last working
commit ?
For me 96d3098db835d58649b73a5788898bd7672a319b was the last working
code, as it doesn't change functionality.
AFAICT aa69d6974185e9f makes no functional changes, except for
actually propagating the error from i2c_hid_read_register()
rather then hardcoding -EIO. But that should not matter...
See return 0 and kfree(rdesc) which is now missing.
Note that commit aa69d6974185e9f is part of a series and
I would not be surprised if some other commit in that series
is causing your problem, but aa69d6974185e9f itself seems
rather harmless.
Well, it might also revealed an issue, which was ignored all the time
and by erroring out it might stop the process to continue something. Cos
maybe return 0 had skipped any recorded error before.
If you want me to test or try something else, let me know. I'm open for
ideas.
Regards,
Hans
--
Best, Philip