On 12/3/24 16:00, David Hildenbrand wrote:
On 03.12.24 15:39, William Roche wrote:
[...]
Our new Qemu code is testing first the fallocate+MADV_DONTNEED procedure
for standard sized pages (in ram_block_discard_range()) and only folds
back to the mmap() use if it fails. So maybe my proposal to implement:
+ /*
+ * Fold back to using mmap(), but it should not
repair a
+ * shared file memory region. In this case we fail.
+ */
+ if (block->fd >= 0 && qemu_ram_is_shared(block)) {
+ error_report("Shared memory poison recovery
failure addr: "
+ RAM_ADDR_FMT "@" RAM_ADDR_FMT "",
+ length, addr);
+ exit(1);
+ }
Could be the right choice.
Right. But then, what about a mmap(MAP_PRIVATE, shmem), where the
pagecache page is poisoned and needs an explicit fallocate? :)
It's all tricky. I wonder if we should just say "if it's backed by a
file, and we cannot discard, then mmap() can't fix it reliably".
if (block->fd >= 0) {
...
}
After all, we don't even expect the fallocate/MADV_DONTNEED to ever
fail :) So I was also wondering if we could get rid of the
mmap(MAP_FIXED) completely ... but who knows what older Linux kernels do.
I agree that we expect the ram_block_discard_range() function to be
working on recent kernels, and to do what's necessary to reuse the
poisoned memory area.
The case where older kernels could be a problem is where fallocate()
would not work on a given file, or madvice(MADV_DONTNEED or MADV_REMOVE)
would not work on standard sized pages. As ram_block_discard_range()
currently avoids using madvise on huge pages.
In this case, the generic/minimal way to make the memory usable again
(in all cases) would be to:
- fallocate/PUNCH_HOLE the given file (if any)
- and remap the area
even if it's not _mandatory_ in all cases.
Would you like me to add an fallocate(PUNCH_HOLE) call in the helper
function qemu_ram_remap_mmap() when a file descriptor is provided
(before remapping the area) ?
This way, we don't need to know if ram_block_discard_range() failed on
the fallocate() or the madvise(); in the worst case scenario, we would
PUNCH twice. If fallocate fails or mmap fails, we exit.
I haven't seen a problem punching a file twice - do you see any problem ?
Do you find this possibility acceptable ? Or should I just go for the
immediate failure when ram_block_discard_range() fails on a case with a
file descriptor as you suggest ?
Please let me know if you find any problem with this approach, as it
could help to have this poison recovery scenario to work on more kernels.
Thanks in advance for your feedback.
William.