Re: Recurring INEQUIVALENT ALIASES issues and userland corruption/crashes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On 13 Apr 2022, at 00:45, Sam James <sam@xxxxxxxxxx> wrote:
> 
> 
> 
>> On 12 Apr 2022, at 14:20, John David Anglin <dave.anglin@xxxxxxxx> wrote:
>> 
>> On 2022-04-12 8:27 a.m., John David Anglin wrote:
>>> On 2022-04-12 1:18 a.m., Sam James wrote:
>>>>>> Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x411ee000 and 0x428c9000 in file bash
>>>>>> ```
>>>>> It seems all these messages result from a single call to flush_dcache_page. Note the sequential behavior of old_addr
>>>>> and addr, and message times.
>>>> FWIW, from Helge's config on 5.10.108 (config changes on my end: just disabling unneeded devices to speed up build), I have the same
>>>> horrible wall:
>>> This change might help:
>>> https://lore.kernel.org/linux-parisc/YlNw8jzP9OQRKvlV@mx3210.localdomain/T/#u
>>> 
>>> It applies on top of Helge's current for-next tree which is based on 5.18.0-rc1+.
>>> 
>>> The messages will no longer appear with this patch on c8000/rp34xx. However, the loop corruption
>>> might still occur. If that happens, I think the stall detector will trigger, or maybe some other crash.
>>> 
>>> The loop is changed to flush all mount points on machines with PA8800 or PA8900 processors as I
>>> believe these CPUs don't support equivalent aliases.
>> 
>> Thousands of messages aren't useful. I would suggest adding a BUG_ON statement in the loop that
>> triggers on the first message. That might help find the circumstances that cause the problem.
>> 
> 
> Your change *seems* to have prevented the "bad wall"! But now we get some silent runtime corruption
> and binaries crashing (5.18.0_rc2 + for-next + your patch).
> 
> So this seems like a good improvement given those crashes happened previously too, although maybe
> less often.
> 
> Not sure how to get more debugging info yet, there is nothing helpful in dmesg (no messages at all when
> it happens). Suggestions (given it is not hitting that loop)?

Spoke slightly too soon: processes dying / corruption happened with v1, and we maybe got a bit longer out of v2,
but then issues started again (processes dying, nothing in dmesg).

Once the "bad state" happens, the system is generally unreliable. I tried to upgrade man-db and then I got
a gcc ICE:
```
/bin/sh ../../libtool  --tag=CC   --mode=compile hppa2.0-unknown-linux-gnu-gcc -DHAVE_CONFIG_H -I. -I../..  -DDEFAULT_TEXT_DOMAIN=\"man-db-gnulib\"   -Wno-cast-qual -Wno-conversion -Wno-float-equal -Wno-sign-compare -Wno-undef -Wno-unused-function -Wno-unused-parameter -Wno-float-conversion -Wimplicit-fallthrough -Wno-pedantic -Wno-sign-conversion -Wno-type-limits -Wno-unsuffixed-float-constants -O2 -pipe -march=2.0 -Wall -c -o glthread/libgnu_la-threadlib.lo `test -f 'glthread/threadlib.c' || echo './'`glthread/threadlib.c
/bin/sh ../../libtool  --tag=CC   --mode=compile hppa2.0-unknown-linux-gnu-gcc -DHAVE_CONFIG_H -I. -I../..  -DDEFAULT_TEXT_DOMAIN=\"man-db-gnulib\"   -Wno-cast-qual -Wno-conversion -Wno-float-equal -Wno-sign-compare -Wno-undef -Wno-unused-function -Wno-unused-parameter -Wno-float-conversion -Wimplicit-fallthrough -Wno-pedantic -Wno-sign-conversion -Wno-type-limits -Wno-unsuffixed-float-constants -O2 -pipe -march=2.0 -Wall -c -o libgnu_la-timespec.lo `test -f 'timespec.c' || echo './'`timespec.c
during RTL pass: reload
In file included from regex.c:74:
regcomp.c: In function ‘parse_expression’:
regcomp.c:2421:1: internal compiler error: Segmentation fault
 2421 | }
      | ^
libtool: compile:  hppa2.0-unknown-linux-gnu-gcc -DHAVE_CONFIG_H -I. -I../.. -DDEFAULT_TEXT_DOMAIN=\"man-db-gnulib\" -Wno-cast-qual -Wno-conversion -Wno-float-equal -Wno-sign-compare -Wno-undef -Wno-unused-function -Wno-unused-parameter -Wno-float-conversion -Wimplicit-fallthrough -Wno-pedantic -Wno-sign-conversion -Wno-type-limits -Wno-unsuffixed-float-constants -O2 -pipe -march=2.0 -Wall -c glthread/threadlib.c  -fPIC -DPIC -o glthread/.libs/libgnu_la-threadlib.o
libtool: compile:  hppa2.0-unknown-linux-gnu-gcc -DHAVE_CONFIG_H -I. -I../.. -DDEFAULT_TEXT_DOMAIN=\"man-db-gnulib\" -Wno-cast-qual -Wno-conversion -Wno-float-equal -Wno-sign-compare -Wno-undef -Wno-unused-function -Wno-unused-parameter -Wno-float-conversion -Wimplicit-fallthrough -Wno-pedantic -Wno-sign-conversion -Wno-type-limits -Wno-unsuffixed-float-constants -O2 -pipe -march=2.0 -Wall -c timespec.c  -fPIC -DPIC -o .libs/libgnu_la-timespec.o
/bin/sh ../../libtool  --tag=CC   --mode=compile hppa2.0-unknown-linux-gnu-gcc -DHAVE_CONFIG_H -I. -I../..  -DDEFAULT_TEXT_DOMAIN=\"man-db-gnulib\"   -Wno-cast-qual -Wno-conversion -Wno-float-equal -Wno-sign-compare -Wno-undef -Wno-unused-function -Wno-unused-parameter -Wno-float-conversion -Wimplicit-fallthrough -Wno-pedantic -Wno-sign-conversion -Wno-type-limits -Wno-unsuffixed-float-constants -O2 -pipe -march=2.0 -Wall -c -o libgnu_la-unistd.lo `test -f 'unistd.c' || echo './'`unistd.c
/bin/sh ../../libtool  --tag=CC   --mode=compile hppa2.0-unknown-linux-gnu-gcc -DHAVE_CONFIG_H -I. -I../..  -DDEFAULT_TEXT_DOMAIN=\"man-db-gnulib\"   -Wno-cast-qual -Wno-conversion -Wno-float-equal -Wno-sign-compare -Wno-undef -Wno-unused-function -Wno-unused-parameter -Wno-float-conversion -Wimplicit-fallthrough -Wno-pedantic -Wno-sign-conversion -Wno-type-limits -Wno-unsuffixed-float-constants -O2 -pipe -march=2.0 -Wall -c -o libgnu_la-dup-safer.lo `test -f 'dup-safer.c' || echo './'`dup-safer.c
0xf61f7313 __libc_start_call_main
        ../sysdeps/nptl/libc_start_call_main.h:58
0xf61f746f __libc_start_main_impl
        /var/tmp/portage/sys-libs/glibc-2.34-r11/work/glibc-2.34/csu/libc-start.c:409
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://bugs.gentoo.org/> for instructions.
make[4]: *** [Makefile:3664: libgnu_la-regex.lo] Error 1
make[4]: *** Waiting for unfinished jobs....
```

(I don't anticipate this being a genuine ICE, as it only happens when the system becomes "tainted", and is not reproducible after reboots during normal activity.)

Best,
sam

Attachment: signature.asc
Description: Message signed with OpenPGP


[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux