On 03.12.24 15:39, William Roche wrote:
On 12/3/24 15:08, David Hildenbrand wrote:
[...]
Let me take a look at your tool below if I can find an explanation of
what is happening, because it's weird :)
[...]
At the end of this email, I included the source code of a simplistic
test case that shows that the page is replaced in the case of standard
page size.
The idea of this test is simple:
1/ Create a local FILE with:
# dd if=/dev/zero of=./FILE bs=4k count=2
2+0 records in
2+0 records out
8192 bytes (8.2 kB, 8.0 KiB) copied, 0.000337674 s, 24.3 MB/s
2/ As root run:
# ./poisonedShared4k
Mapping 8192 bytes from file FILE
Reading and writing the first 2 pages content:
Read: Read: Wrote: Initial mem page 0
Wrote: Initial mem page 1
Data pages at 0x7f71a19d6000 physically 0x124fb0000
Data pages at 0x7f71a19d7000 physically 0x128ce4000
Poisoning 4k at 0x7f71a19d6000
Signal 7 received
code 4 Signal code
addr 0x7f71a19d6000 Memory location
si_addr_lsb 12
siglongjmp used
Remapping the poisoned page
Reading and writing the first 2 pages content:
Read: Read: Initial mem page 1
Wrote: Rewrite mem page 0
Wrote: Rewrite mem page 1
Data pages at 0x7f71a19d6000 physically 0x10c367000
Data pages at 0x7f71a19d7000 physically 0x128ce4000
---
As we can see, this process:
- maps the FILE,
- tries to read and write the beginning of the first 2 pages
- gives their physical addresses
- poison the first page with a madvise(MADV_HWPOISON) call
- shows the SIGBUS signal received and recovers from it
- simply remaps the same page from the file
- tries again to read and write the beginning of the first 2 pages
- gives their physical addresses
Turns out the code will try to truncate the pagecache page using
mapping->a_ops->error_remove_folio().
That, however, is only implemented on *some* filesystems.
Most prominently, it is not implemented on shmem as well.
So if you run your test with shmem (e.g., /tmp/FILE), it doesn't work.
Correct, on tmpfs the test case fails to continue to use the memory area
and gets a SIGBUS. And it works with xfs.
Using fallocate+MADV_DONTNEED seems to work on shmem.
Our new Qemu code is testing first the fallocate+MADV_DONTNEED procedure
for standard sized pages (in ram_block_discard_range()) and only folds
back to the mmap() use if it fails. So maybe my proposal to implement:
+ /*
+ * Fold back to using mmap(), but it should not
repair a
+ * shared file memory region. In this case we fail.
+ */
+ if (block->fd >= 0 && qemu_ram_is_shared(block)) {
+ error_report("Shared memory poison recovery
failure addr: "
+ RAM_ADDR_FMT "@" RAM_ADDR_FMT "",
+ length, addr);
+ exit(1);
+ }
Could be the right choice.
Right. But then, what about a mmap(MAP_PRIVATE, shmem), where the
pagecache page is poisoned and needs an explicit fallocate? :)
It's all tricky. I wonder if we should just say "if it's backed by a
file, and we cannot discard, then mmap() can't fix it reliably".
if (block->fd >= 0) {
...
}
After all, we don't even expect the fallocate/MADV_DONTNEED to ever fail
:) So I was also wondering if we could get rid of the mmap(MAP_FIXED)
completely ... but who knows what older Linux kernels do.
--
Cheers,
David / dhildenb