On 2024-05-09 13:10, John David Anglin wrote:
On 2024-05-08 4:52 p.m., John David Anglin wrote:
with no accompanying stack trace and then the BMC would restart the whole
machine automatically. These were infrequent enough that the segfaults
were the bigger problem, but after applying this patch on top of 6.8, this
changed the dynamic. It seems to occur during builds with varying I/O
loads. For example, I was able to build gcc fine, with no segfaults, but
I was unable to build perl, a much smaller build, without crashing the
machine. I did not observe any segfaults over the day or 2 I ran this
patch, but that's not an unheard-of stretch of
time even without it, and I am being forced to revert because of the panics.
Looks like there is a problem with 6.8. I'll do some testing with it.
So far, I haven't seen any panics with 6.8.9 but I have seen some random
segmentation faults
in the gcc testsuite. I looked at one ld fault in some detail. 18
contiguous words in the elf_link_hash_entry
struct were zeroed starting with the last word in the bfd_link_hash_entry
struct causing the fault.
The section pointer was zeroed.
18 words is a rather strange number of words to corrupt and corruption
doesn't seem related
to object structure. In any case, it is not page related.
It's really hard to tell how this happens. The corrupt object was at a
slightly different location
than it is when ld is run under gdb. Can't duplicate in gdb.
Dave
Dave, not sure how much testing you have done with current mainline kernels,
but I've had to temporarily give up on 6.8 and 6.9 for now, as most heavy
builds quickly hit that kernel panic. 6.6 does not seem to have the problem
though. The patch from this thread does not seem to have made a difference
one way or the other w.r.t. segfaults.