Re: [PATCH] parisc: Try to fix random segmentation faults in package builds

matoro <matoro_mailinglist_kernel@xxxxxxxxx> · Wed, 29 May 2024 11:54:00 -0400

On 2024-05-09 13:10, John David Anglin wrote:
On 2024-05-08 4:52 p.m., John David Anglin wrote:
with no accompanying stack trace and then the BMC would restart the whole 
machine automatically.  These were infrequent enough that the segfaults 
were the bigger problem, but after applying this patch on top of 6.8, this 
changed the dynamic.  It seems to occur during builds with varying I/O 
loads.  For example, I was able to build gcc fine, with no segfaults, but 
I was unable to build perl, a much smaller build, without crashing the 
machine. I did not observe any segfaults over the day or 2 I ran this 
patch, but that's not an unheard-of stretch of 
time even without it, and I am being forced to revert because of the panics.
Looks like there is a problem with 6.8.  I'll do some testing with it.
So far, I haven't seen any panics with 6.8.9 but I have seen some random 
segmentation faults
in the gcc testsuite.  I looked at one ld fault in some detail.  18 
contiguous words in the  elf_link_hash_entry
struct were zeroed starting with the last word in the bfd_link_hash_entry 
struct causing the fault.
The section pointer was zeroed.

18 words is a rather strange number of words to corrupt and corruption 
doesn't seem related
to object structure.  In any case, it is not page related.

It's really hard to tell how this happens.  The corrupt object was at a 
slightly different location
than it is when ld is run under gdb.  Can't duplicate in gdb.

Dave

Dave, not sure how much testing you have done with current mainline kernels, 
but I've had to temporarily give up on 6.8 and 6.9 for now, as most heavy 
builds quickly hit that kernel panic.  6.6 does not seem to have the problem 
though.  The patch from this thread does not seem to have made a difference 
one way or the other w.r.t. segfaults.