Re: Recurring INEQUIVALENT ALIASES issues and userland corruption/crashes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On 12 Apr 2022, at 14:20, John David Anglin <dave.anglin@xxxxxxxx> wrote:
> 
> On 2022-04-12 8:27 a.m., John David Anglin wrote:
>> On 2022-04-12 1:18 a.m., Sam James wrote:
>>>>> Mar 22 04:19:55 muta.hppa.dev.gentoo.org kernel: INEQUIVALENT ALIASES 0x411ee000 and 0x428c9000 in file bash
>>>>> ```
>>>> It seems all these messages result from a single call to flush_dcache_page.  Note the sequential behavior of old_addr
>>>> and addr, and message times.
>>> FWIW, from Helge's config on 5.10.108 (config changes on my end: just disabling unneeded devices to speed up build), I have the same
>>> horrible wall:
>> This change might help:
>> https://lore.kernel.org/linux-parisc/YlNw8jzP9OQRKvlV@mx3210.localdomain/T/#u
>> 
>> It applies on top of Helge's current for-next tree which is based on 5.18.0-rc1+.
>> 
>> The messages will no longer appear with this patch on c8000/rp34xx. However, the loop corruption
>> might still occur.  If that happens, I think the stall detector will trigger, or maybe some other crash.
>> 
>> The loop is changed to flush all mount points on machines with PA8800 or PA8900 processors as I
>> believe these CPUs don't support equivalent aliases.
> 
> Thousands of messages aren't useful.  I would suggest adding a BUG_ON statement in the loop that
> triggers on the first message.  That might help find the circumstances that cause the problem.
> 

Your change *seems* to have prevented the "bad wall"! But now we get some silent runtime corruption
and binaries crashing (5.18.0_rc2 + for-next + your patch).

So this seems like a good improvement given those crashes happened previously too, although maybe
less often.

Not sure how to get more debugging info yet, there is nothing helpful in dmesg (no messages at all when
it happens). Suggestions (given it is not hitting that loop)?

> I think the loop may get corrupted when the mapping code fails to find an address aligned on a 4MB
> boundary.  Another possibility might be a locking issue.  In both these cases, the messages are just a
> symptom of a problem elsewhere.

I think we are getting somewhere based on the above. Big thanks.

> 
> Dave
> 
> --
> John David Anglin dave.anglin@xxxxxxxx

Attachment: signature.asc
Description: Message signed with OpenPGP


[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux