Hi Sam, On 3/22/22 18:52, Sam James wrote: > In Gentoo, we've just got our hands on an RP3440 (PA8800) which seems to quite easily hit inequivalent aliasing issues. > > We've found that under some workloads, the machine copes fine, none of that appears in dmesg, and all is well - even for > over a week. But as soon as we start other workloads (the problematic one is building "stages" -- release media for Gentoo), > within 30m or so, the machine is in a broken state, with these messages flooding dmesg: > ``` > Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x42994000 and 0x426e1000 in file bash > Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x426e1000 and 0x41b56000 in file bash > Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x41b56000 and 0x41aae000 in file bash > Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x41aae000 and 0x42774000 in file bash > Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x42774000 and 0x41202000 in file bash > Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x41202000 and 0x428dd000 in file bash > Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x41e2c000 and 0x418f6000 in file bash > Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x418f6000 and 0x42980000 in file bash > Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x42980000 and 0x426cd000 in file bash > Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x426cd000 and 0x41b42000 in file bash > Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x41b42000 and 0x41a9a000 in file bash > Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x41a9a000 and 0x42760000 in file bash > Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x42760000 and 0x411ee000 in file bash > Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x411ee000 and 0x428c9000 in file bash > ``` > > When it's in this state, GCC ends up ICEing at some point and other userland command fails too (e.g. last night > I tried unpacking a kernel and 'xz' failed the first time, but worked the second). It might be of note that I think > the failures end up happening during a HPPA 1.1 build. > > I appreciate this isn't really enough information to solve the problem, but I'm not sure what I need to obtain: > any suggestions for how to debug this further & get more information to better receive assistance would be most welcome. > > The machine is currently running 5.17.0 along with Helge's tree up to (and including) Linus's pull for 5.18.0 > (https://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux.git/commit/?h=for-next&id=a04b1bf574e1f4875ea91f5c62ca051666443200). The INEQUIVALENT ALIASES messages are most likely not related to the instability of your machine. I see them randomly on the debian buildd servers as well. Instead of using the latest (development) kernels, I'd suggest that you first try with a "stable" kernel. On the debian buildd servers I'm currently running Kernel 5.10.106+, which is pretty stable. I think Dave is running 5.16.x quite ok. > We're also using GCC 11.2 (but a snapshot from their stable 11 branch), glibc 2.34 (with latest patches), and latest > Binutils 2.37 (with patches from upstream again). > > I've also attached the running kernel config in case any suggestions can be made there to either aid debugging or > reduce the chances of this issue occurring. > > TL:DR: Lots of inequivalent aliases issues when running certain intensive workloads (but not others?), system ends up > in a bad state and needs a reboot to function correctly (otherwise userland may misbehave/crash), need more help > with how to debug/get more information out of it/narrow it down. > > Of course, if needed, we can provide access to the machine for kernel maintainers and show them how to induce a broken > State (or do it for them repeatedly) if we can't find a smaller test case. Is there any other output in dmesg which is not INEQUIVALENT ALIASES? E.g. "stuck processes" messages? Helge