On 2022-03-22 1:52 p.m., Sam James wrote:
Hi all, In Gentoo, we've just got our hands on an RP3440 (PA8800) which seems to quite easily hit inequivalent aliasing issues. We've found that under some workloads, the machine copes fine, none of that appears in dmesg, and all is well - even for over a week. But as soon as we start other workloads (the problematic one is building "stages" -- release media for Gentoo), within 30m or so, the machine is in a broken state, with these messages flooding dmesg: ``` Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x42994000 and 0x426e1000 in file bash Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x426e1000 and 0x41b56000 in file bash Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x41b56000 and 0x41aae000 in file bash Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x41aae000 and 0x42774000 in file bash Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x42774000 and 0x41202000 in file bash Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x41202000 and 0x428dd000 in file bash Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x41e2c000 and 0x418f6000 in file bash Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x418f6000 and 0x42980000 in file bash Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x42980000 and 0x426cd000 in file bash Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x426cd000 and 0x41b42000 in file bash Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x41b42000 and 0x41a9a000 in file bash Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x41a9a000 and 0x42760000 in file bash Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x42760000 and 0x411ee000 in file bash Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x411ee000 and 0x428c9000 in file bash
I don't think this is new. There are no changes to the code that detects INEQUIVALENT ALIASES in the latest pull. I've seen this before but it's not occurring in my current builds for rp3440 and c8000. I've been running for-next changes on c8000 for several weeks. I suspect a problem with shmat but I'm not sure.
``` When it's in this state, GCC ends up ICEing at some point and other userland command fails too (e.g. last night I tried unpacking a kernel and 'xz' failed the first time, but worked the second). It might be of note that I think the failures end up happening during a HPPA 1.1 build. I appreciate this isn't really enough information to solve the problem, but I'm not sure what I need to obtain: any suggestions for how to debug this further & get more information to better receive assistance would be most welcome. The machine is currently running 5.17.0 along with Helge's tree up to (and including) Linus's pull for 5.18.0 (https://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux.git/commit/?h=for-next&id=a04b1bf574e1f4875ea91f5c62ca051666443200). We're also using GCC 11.2 (but a snapshot from their stable 11 branch), glibc 2.34 (with latest patches), and latest Binutils 2.37 (with patches from upstream again). I've also attached the running kernel config in case any suggestions can be made there to either aid debugging or reduce the chances of this issue occurring. TL:DR: Lots of inequivalent aliases issues when running certain intensive workloads (but not others?), system ends up in a bad state and needs a reboot to function correctly (otherwise userland may misbehave/crash), need more help with how to debug/get more information out of it/narrow it down. Of course, if needed, we can provide access to the machine for kernel maintainers and show them how to induce a broken State (or do it for them repeatedly) if we can't find a smaller test case.
Dave -- John David Anglin dave.anglin@xxxxxxxx