Hi Philip, On 3/27/24 4:47 PM, Philip Müller wrote: > On 27/03/2024 17:31, Hans de Goede wrote: >> Hi Philip, >> >> On 3/19/24 5:52 AM, Philip Müller wrote: >>> On 18/03/2024 17:58, Philip Müller wrote: >>>> I'm currently developing on the OrangePi Neo-01, which ships with two similar touchpads using the Synaptics driver. On 6.7.10 those two devices get detected normally. On 6.8.1 it seems to error out. >>>> >>>> I either get none, or at best only one of those two devices. >>>> >>>> i2c_hid_acpi: probe of i2c-XXX0001:00 failed with error -110 >>>> i2c_hid_acpi: probe of i2c-XXX0002:00 failed with error -110 >>>> >>>> what would be the best way to debug this? >>>> >>> >>> I found the regression in commit aa69d6974185e9f7a552ba982540a38e34f69690 >>> HID: i2c-hid: Switch i2c_hid_parse() to goto style error handling >> >> I just checked that patch and I don't see anyway how that can create >> this regression. I assume you did a git bisect ? >> >> Did you try the last commit in the tree before that commit got added >> and verified that that one works where as building a kernel from commit >> aa69d6974185e9f itself does not work ? >> > > No I didn't do a git bisect actually. I did it more manually like git bisect might do it. What I did was to compile just that module and load it against the 6.7.10 kernel. a9f68ffe1170ca4bc17ab29067d806a354a026e0 is the last commit you find in 6.7 series. All the other commits are part of the 6.8 series or later. So I looked at the commit titles and picked a9f68ffe1170ca4bc17ab29067d806a354a026e0 which I compiled the "out of tree" module. I unloaded the 6.7.10 module, copied the new compiled one over and did the rmod, modprobe batch script to see how the touchpads react. > > Then I went to compile and tested af93a167eda90192563bce64c4bb989836afac1f which created the issue with -110 errors. So I went to aa69d6974185e9f7a552ba982540a38e34f69690 which also results in the error. So I went back to 96d3098db835d58649b73a5788898bd7672a319b which still works. Since now 6.9 development started, I may try 00aab7dcb2267f2aef59447602f34501efe1a07f to see if that fixes anything. > > I can get the touchpads both working in 6.8 code when I do enough tries by a rmod/modprobe loop. Based on the logs you see, sometimes both are up, mostly one or none is up. having only two commits of 6.7 code in the module it works as 6.7 code. Only when "HID: i2c-hid: Switch i2c_hid_parse() to goto style error handling" gets added I see the error and can reproduce it on my device as shown in the log. > > Before we had > > hid_err(hid, "reading report descriptor failed\n"); > kfree(rdesc); > > followed by the hardcoded return -EIO; > > Now we have an redirect to the out function but don't do kfree(rdesc) if you hit the error anymore. > > Also the last return 0; got removed. So maybe what ever error the device might have hit before might be reverted due to return 0 or not freeing rdesc when hitting the error might prevent it to load. > > Anyhow, that patch is the first patch which changes the function in that series. Sure it might not be able to compile the whole kernel with it. I only tested the module based on the given code in an out of tree module style. > > So I might test 6.9rc1 on the device and let you know if that kernel has the same issue still. > >>> When I use the commit before I can rmmod and modprobe in a batch script using a loop without erroring out to -110. Attached the testing script and dmesg log snippets >>> >>> #!/bin/bash >>> for ((n=0;n<5;n++)) >>> do >>> sudo rmmod i2c_hid_acpi >>> sleep 1 >>> sudo modprobe i2c_hid_acpi --force-vermagic >>> sleep 2 >>> done >> >> Ok, so you did try the commit before and that did work. Are you >> sure that aa69d6974185e9f was not actually the last working >> commit ? > > For me 96d3098db835d58649b73a5788898bd7672a319b was the last working code, as it doesn't change functionality. > >> >> AFAICT aa69d6974185e9f makes no functional changes, except for >> actually propagating the error from i2c_hid_read_register() >> rather then hardcoding -EIO. But that should not matter... > > See return 0 and kfree(rdesc) which is now missing. The new code is (on error) "goto out;" and then out: looks like this: out: if (!use_override) kfree(rdesc); return ret; Since the "goto out;" is an else branch of "if (use_override) { } else {}" we know use_override is false here, so !use_override is true and the kfree() still happens. Also even if that were to no longer happen that would just be a small memleak and would not explain the probe failures you are seeing. As for "return 0" vs "return ret", the code at the end of i2c_hid_parse() before commit aa69d6974185e9f looked like this: if (ret) { dbg_hid("parsing report descriptor failed\n"); return ret; } return 0; So this would also always return ret because if ret is non 0 we always hit if (ret) return ret; and if ret is 0 then return 0 and return ret are functionally the same. So since the functionality at the end of the function did not change, it still always ends up returning ret, but now does it in a less circumspect way. Which leaves just the change in the: hid_err(hid, "reading report descriptor failed\n"); code path which used to return -EIO and now does goto out which ends up returning ret which is the actual return value of i2c_hid_read_register() instead of hardcoding -EIO. I checked and nothing in the HID core treats -EIO in a special way. So I still do not see how commit aa69d6974185e9f can break anything and I really wonder if you did not make a mistake during testing. Can you try building a clean broken kernel (the entire kernel) and then in reverse order (of them being committed) revert my recent i2c-hid changes 1 by one and after each revert check if things start working again ? And then see which is the first revert after which things start working again, that one is the culprit and I expect you to find a different commit then aa69d6974185e9f. Unless I'm missing / overlooking something in commit aa69d6974185e9f... Note I'm going offline for a long weekend and I won't be replying to emails until next week Tuesday. Regards, Hans