On Wed, May 11, 2022 at 2:37 PM Michael Cree <mcree@xxxxxxxxxxxx> wrote: > > On Sat, May 07, 2022 at 11:27:15AM -0700, Yu Zhao wrote: > > On Fri, May 6, 2022 at 6:57 PM Hillf Danton <hdanton@xxxxxxxx> wrote: > > > > > > On Sat, 7 May 2022 09:21:25 +1200 Michael Cree wrote: > > > > Alpha kernel has been exhibiting rare and random memory > > > > corruptions/segaults in user space since the 5.9.y kernel. First seen > > > > on the Debian Ports build daemon when running 5.10.y kernel resulting > > > > in the occasional (one or two a day) build failures with gcc ICEs either > > > > due to self detected corrupt memory structures or segfaults. Have been > > > > running 5.8.y kernel without such problems for over six months. > > > > > > > > Tried bisecting last year but went off track with incorrect good/bad > > > > determinations due to rare nature of bug. After trying a 5.16.y kernel > > > > early this year and seen the bug is still present retried the bisection > > > > and have got to: > > > > > > > > aae466b0052e1888edd1d7f473d4310d64936196 is the first bad commit > > > > commit aae466b0052e1888edd1d7f473d4310d64936196 > > > > Author: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> > > > > Date: Tue Aug 11 18:30:50 2020 -0700 > > > > > > > > mm/swap: implement workingset detection for anonymous LRU > > > > This commit seems innocent to me. While not ruling out anything, i.e., > > this commit, compiler, qemu, userspace itself, etc., my wild guess is > > the problem is memory barrier related. Two lock/unlock pairs, which > > imply two full barriers, were removed. This is not a small deal on > > Alpha, since it imposes no constraints on cache coherency, AFAIK. > > > > Can you please try the attached patch on top of this commit? Thanks! > > Thanks, I have that running now for a day without any problem showing > up, but that's not long enough to be sure it has fixed the problem. Will > get back to you after another day or two of testing. Any luck? Thanks!