On 06.12.24 19:26, William Roche wrote:
On 12/3/24 16:00, David Hildenbrand wrote:
On 03.12.24 15:39, William Roche wrote:
[...]
Our new Qemu code is testing first the fallocate+MADV_DONTNEED procedure
for standard sized pages (in ram_block_discard_range()) and only folds
back to the mmap() use if it fails. So maybe my proposal to implement:
+ /*
+ * Fold back to using mmap(), but it should not
repair a
+ * shared file memory region. In this case we fail.
+ */
+ if (block->fd >= 0 && qemu_ram_is_shared(block)) {
+ error_report("Shared memory poison recovery
failure addr: "
+ RAM_ADDR_FMT "@" RAM_ADDR_FMT "",
+ length, addr);
+ exit(1);
+ }
Could be the right choice.
Right. But then, what about a mmap(MAP_PRIVATE, shmem), where the
pagecache page is poisoned and needs an explicit fallocate? :)
It's all tricky. I wonder if we should just say "if it's backed by a
file, and we cannot discard, then mmap() can't fix it reliably".
if (block->fd >= 0) {
...
}
After all, we don't even expect the fallocate/MADV_DONTNEED to ever
fail :) So I was also wondering if we could get rid of the
mmap(MAP_FIXED) completely ... but who knows what older Linux kernels do.
Hi,
I agree that we expect the ram_block_discard_range() function to be
working on recent kernels, and to do what's necessary to reuse the
poisoned memory area.
The case where older kernels could be a problem is where fallocate()
would not work on a given file
In that case we cannot possibly handle it correctly in all with
mmap(MAP_FIXED), especially not with shmem/hugetlb :/
So failing is correct.
, or madvice(MADV_DONTNEED or MADV_REMOVE)
would not work on standard sized pages.
I assume both can be expected to work in any reasonable Linux setup
today (-hugetlb) :)
As ram_block_discard_range()
currently avoids using madvise on huge pages.
Right, because support might not be around for hugetlb.
In this case, the generic/minimal way to make the memory usable again
(in all cases) would be to:
- fallocate/PUNCH_HOLE the given file (if any)
- and remap the area
even if it's not _mandatory_ in all cases.
Would you like me to add an fallocate(PUNCH_HOLE) call in the helper
function qemu_ram_remap_mmap() when a file descriptor is provided
(before remapping the area) ?
> > This way, we don't need to know if ram_block_discard_range() failed on
the fallocate() or the madvise(); in the worst case scenario, we would
PUNCH twice. If fallocate fails or mmap fails, we exit.
I haven't seen a problem punching a file twice - do you see any problem ?
Hm. I'd like to avoid another fallocate(). It would really only make
sense if we expect fallocate() to work but madvise() to fail; and I
don't think that's our expectation.
virtio-balloon has been using ram_block_discard_range() forever, and so
far nobody really ever reported any of the errors from the function to
the best of my knowledge ...
Do you find this possibility acceptable ? Or should I just go for the
immediate failure when ram_block_discard_range() fails on a case with a
file descriptor as you suggest ?
Please let me know if you find any problem with this approach, as it
could help to have this poison recovery scenario to work on more kernels.
I'd say, let's keep it simple. Try ram_block_discard_range() if that
fails (a) and we have a file, bail out, (b) we don't have a file, do the
remap().
If we ever run into issues with that approach, we can investigate why it
fails and what to do about it (e.g., fallocate).
My best guess it that the whole remap part can be avoided completely.
--
Cheers,
David / dhildenb