At least according to the docs I see, fakenuma is x86-specific. There
are multi-socket machines, but the one I have is single-socket
single-core.
I can provide access to this machine to play around with it though!
Either simple shell access or serial access if some kernel poking is
needed.
Would that be helpful or is a NUMA system going to be required for
debugging?
-------- Original Message --------
Subject: Re: ext4 corruption on alpha with 4.20.0-09062-gd8372ba8ce28
Date: 2022-08-26 07:04
From: Vlastimil Babka <vbabka@xxxxxxx>
To: Jan Kara <jack@xxxxxxx>, matoro
<matoro_mailinglist_kernel@xxxxxxxxx>
On 8/26/22 12:55, Jan Kara wrote:
On Thu 25-08-22 11:05:48, matoro wrote:
Hello all, I know this is quite an old thread. I recently acquired
some
alpha hardware and have run into this exact same problem on the latest
stable kernel (5.18 and 5.19). CONFIG_COMPACTION seems to be totally
broken
and causes userspace to be extremely unstable - random segfaults,
corruption
of glibc data structures, gcc ICEs etc etc - seems most noticable
during
tasks with heavy I/O load.
My hardware is a DS15 (Titan), so only slightly newer than the
Tsunamis
mentioned earlier. The problem is greatly exacerbated when using a
machine-optimized kernel (CONFIG_ALPHA_TITAN) over one with
CONFIG_ALPHA_GENERIC. But it still doesn't go away on a generic
kernel,
just pops up less often, usually very I/O heavy tasks like checking
out a
tag in the kernel repo.
However all of this seems to be dependent on CONFIG_COMPACTION. With
this
toggled off all problems disappear, regardless of other options. I
tried
reverting the commit 88dbcbb3a4847f5e6dfeae952d3105497700c128
mentioned
earlier in the thread (the structure has moved to a different file but
was
otherwise the same), but it unfortunately did not make a difference.
Since this doesn't seem to have a known cause or an easy fix, would it
be
reasonable to just add a Kconfig dep to disable it automatically on
alpha?
Thanks for report. I guess this just confirms that migration of
pagecache
pages is somehow broken on Alpha. Maybe we are missing to flush some
cache
specific for Alpha? Or maybe the page migration code is not safe wrt
the
peculiar memory ordering Alpha has... I think this will need someone
with
Alpha HW and willingness to dive into MM internals to debug this. Added
Vlasta to CC mostly for awareness and in case it rings some bells :).
Hi, doesn't ring any bells unfortunately. Does corruption also happen
when
mmapping a file and applying mbind() with MPOL_MF_MOVE or
migrate_pages()?
That should allow more controlled migration experimens than through
compaction. But that would also need a NUMA machine or a fakenuma
support,
dunno if alpha has that?