On 11/10/2024 12.24, David Hildenbrand wrote:
During testing, it was found that we can get PMD mappings in processes
where THP (and more precisely, PMD mappings) are supposed to be disabled.
While it works as expected for anon+shmem, the pagecache is the problematic
bit.
For s390 KVM this currently means that a VM backed by a file located on
filesystem with large folio support can crash when KVM tries accessing
the problematic page, because the readahead logic might decide to use
a PMD-sized THP and faulting it into the page tables will install a
PMD mapping, something that s390 KVM cannot tolerate.
This might also be a problem with HW that does not support PMD mappings,
but I did not try reproducing it.
Fix it by respecting the ways to disable THPs when deciding whether we
can install a PMD mapping. khugepaged should already be taking care of
not collapsing if THPs are effectively disabled for the hw/process/vma.
An earlier patch was tested by Thomas Huth, this one still needs to
be retested; sending it out already.
I just finished testing your new version of these patches here, and I can
confirm that they are fixing the problem that I was facing, so:
Tested-by: Thomas Huth <thuth@xxxxxxxxxx>
FWIW, the problem can be reproduced by running a KVM guest on a s390x host
like this:
qemu-system-s390x -accel kvm -nographic -m 4G -d guest_errors \
-M s390-ccw-virtio,memory-backend=mem-machine_mem \
-object
memory-backend-file,size=4294967296,prealloc=true,mem-path=$HOME/myfile,share=true,id=mem-machine_mem
Without the fix, the guest crashes immediatly before being able to execute
the first instruction. With the fix applied, you can still see the first
messages of the guest firmware, indicating that the guest started successfully.
Thank you very much for the fix, David!
Thomas