Re: Recurring INEQUIVALENT ALIASES issues and userland corruption/crashes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sam,

On 3/22/22 18:52, Sam James wrote:
> In Gentoo, we've just got our hands on an RP3440 (PA8800) which seems to quite easily hit inequivalent aliasing issues.
>
> We've found that under some workloads, the machine copes fine, none of that appears in dmesg, and all is well - even for
> over a week. But as soon as we start other workloads (the problematic one is building "stages" -- release media for Gentoo),
> within 30m or so, the machine is in a broken state, with these messages flooding dmesg:
> ```
> Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x42994000 and 0x426e1000 in file bash
> Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x426e1000 and 0x41b56000 in file bash
> Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x41b56000 and 0x41aae000 in file bash
> Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x41aae000 and 0x42774000 in file bash
> Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x42774000 and 0x41202000 in file bash
> Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x41202000 and 0x428dd000 in file bash
> Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x41e2c000 and 0x418f6000 in file bash
> Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x418f6000 and 0x42980000 in file bash
> Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x42980000 and 0x426cd000 in file bash
> Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x426cd000 and 0x41b42000 in file bash
> Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x41b42000 and 0x41a9a000 in file bash
> Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x41a9a000 and 0x42760000 in file bash
> Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x42760000 and 0x411ee000 in file bash
> Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x411ee000 and 0x428c9000 in file bash
> ```
>
> When it's in this state, GCC ends up ICEing at some point and other userland command fails too (e.g. last night
> I tried unpacking a kernel and 'xz' failed the first time, but worked the second). It might be of note that I think
> the failures end up happening during a HPPA 1.1 build.
>
> I appreciate this isn't really enough information to solve the problem, but I'm not sure what I need to obtain:
> any suggestions for how to debug this further & get more information to better receive assistance would be most welcome.
>
> The machine is currently running 5.17.0 along with Helge's tree up to (and including) Linus's pull for 5.18.0
> (https://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux.git/commit/?h=for-next&id=a04b1bf574e1f4875ea91f5c62ca051666443200).

The INEQUIVALENT ALIASES messages are most likely not related to the instability
of your machine. I see them randomly on the debian buildd servers as well.

Instead of using the latest (development) kernels, I'd suggest that you
first try with a "stable" kernel.
On the debian buildd servers I'm currently running Kernel 5.10.106+, which is pretty stable.
I think Dave is running 5.16.x quite ok.

> We're also using GCC 11.2 (but a snapshot from their stable 11 branch), glibc 2.34 (with latest patches), and latest
> Binutils 2.37 (with patches from upstream again).
>
> I've also attached the running kernel config in case any suggestions can be made there to either aid debugging or
> reduce the chances of this issue occurring.
>
> TL:DR: Lots of inequivalent aliases issues when running certain intensive workloads (but not others?), system ends up
> in a bad state and needs a reboot to function correctly (otherwise userland may misbehave/crash), need more help
> with how to debug/get more information out of it/narrow it down.
>
> Of course, if needed, we can provide access to the machine for kernel maintainers and show them how to induce a broken
> State (or do it for them repeatedly) if we can't find a smaller test case.

Is there any other output in dmesg which is not INEQUIVALENT ALIASES?
E.g. "stuck processes" messages?

Helge




[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux