[CCing Richard, who apparently faces the same problem according to a recent comment in the bugzilla ticket mentioned earlier: https://bugzilla.kernel.org/show_bug.cgi?id=219331#c8 CCing Mario, who might be interested in this and is a good contact when it comes to issues with AMD stuff like this. CCing the Btrfs list as JFYI, as all three reporters afaics see Btrfs misbehavior or corruptions due to this. Considered to bring Linus in, but decided to wait a bit before doing so.] On 01.10.24 23:40, Chris Hixon wrote: > On 10/1/2024, 12:56:49 PM, "Linux regression tracking (Thorsten Leemhuis)" wrote: >> Basavaraj Natikar, I noticed a report about a regression in >> bugzilla.kernel.org that appears to be caused by a change of yours: >> >> 2105e8e00da467 ("HID: amd_sfh: Improve boot time when SFH is available") >> [v6.9-rc1] >> >> As many (most?) kernel developers don't keep an eye on the bug tracker, >> I decided to write this mail. To quote from >> https://bugzilla.kernel.org/show_bug.cgi?id=219331 : >> >>> I am getting bad page map errors on kernel version 6.9 or newer. >>> They always appear within a few minutes of the system being on, if >>> not immediately upon booting. My system is a Dell Inspiron 7405. > [...] >>> [ 23.234632] systemd-journald[611]: File /var/log/journal/a4e3170bc5be4f52a2080fb7b9f93cf0/user-1000.journal corrupted or uncleanly shut down, renaming and replacing. >>> [ 23.580724] rfkill: input handler enabled >>> [ 25.652067] rfkill: input handler disabled > >>> [ 34.222362] pcie_mp2_amd 0000:03:00.7: Failed to discover, sensors not enabled is 0 >>> [ 34.222379] pcie_mp2_amd 0000:03:00.7: amd_sfh_hid_client_init failed err -95 > > No sensors detected - do we all have that in common? Skyler, Richard? >>> [...] >> See the ticket for more details and the bisection result. Skyler, the >> reporter (CCed), later also added: >> >>> Occasionally I will not get the usual bad page map error, but >>> instead some BTRFS errors followed by the file system going read-only. >> >> Note, we had and earlier regression caused by this change reported by >> Chris Hixon that maybe was not solved completely: >> https://lore.kernel.org/all/3b129b1f-8636-456a-80b4-0f6cce0eef63@xxxxxxxxxxxxx/ > > This looks like the same issue I reported. And sounds a lot like what Richard sees, who also sees disk corruption with Btrfs (see https://bugzilla.redhat.com/show_bug.cgi?id=2314331 ). >> Chris Hixon: do you still encounter errors, or was your issue >> resolved/vanished somehow? > > I still encounter errors with every kernel/patch I've tested. I've blacklisted > the amd_sfh module as a workaround, but when the module is inserted, a crash > similar to those reported will happen soon after the (45 second?) > detection/initialization timeout. It seems to affect whatever part of the > kernel next becomes active. I've had disk corruption as well, when BTRFS is > affected by the memory corruption, Skyler, did you see btrfs disk corruption as well, just like Chris and Richard did? > so I've ended up testing on a USB stick I > can reformat if necessary. I haven't tested new patches/kernels in a while > though. I'll get back to you after I've tried the latest mainline. Also note > that I've tried Fedora Rawhide's debug kernel,