Ok, I can remove the first patch that is created to track the kernel
provided page size and pass it to the kvm_hwpoison_page_add() function,
but we could deal with the page size at the kvm_hwpoison_page_add()
function level as we don't rely on the kernel provided info, but just
the RAMBlock page size.
Great!
I see that ram_block_discard_range() adds more control before
discarding the RAM region and can also call madvise() in addition to
the fallocate punch hole for standard sized memory pages. Now as the
range is supposed to be recreated, I'm not convinced that these
madvise calls are necessary.
They are the proper replacement for the mmap(MAP_FIXED) + fallocate.
That function handles all cases of properly discarding guest RAM.
In the case of hugetlbfs pages, ram_block_discard_range() does the
punch-hole fallocate call (and prints out the warning messages).
The madvise call is only done when (rb->page_size ==
qemu_real_host_page_size()) which isn't true for hugetlbfs.
So need_madvise is false and neither QEMU_MADV_REMOVE nor
QEMU_MADV_DONTNEED madvise calls is performed.
See my other mail regarding fallocte()+hugetlb oddities.
The warning is for MAP_PRIVATE mappings where we cannot be sure that we are
not doing harm to somebody else that is mapping the file :(
See
commit 1d44ff586f8a8e113379430750b5a0a2a3f64cf9
Author: David Hildenbrand <david@xxxxxxxxxx>
Date: Thu Jul 6 09:56:06 2023 +0200
softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping
ram_block_discard_range() cannot possibly do the right thing in
MAP_PRIVATE file mappings in the general case.
To achieve the documented semantics, we also have to punch a hole into
the file, possibly messing with other MAP_PRIVATE/MAP_SHARED mappings
of such a file.
For example, using VM templating -- see commit b17fbbe55cba ("migration:
allow private destination ram with x-ignore-shared") -- in combination with
any mechanism that relies on discarding of RAM is problematic. This
includes:
* Postcopy live migration
* virtio-balloon inflation/deflation or free-page-reporting
* virtio-mem
So at least warn that there is something possibly dangerous is going on
when using ram_block_discard_range() in these cases.
So the warning is the best we can do to say "this is possibly very
problematic, and it might be undesirable".
For hugetlb, users should switch to using memory-backend-memfd or
memory-backend-file,share=on.
But we can also notice that this function will report the following
warning in all cases of not shared file backends:
"ram_block_discard_range: Discarding RAM in private file mappings is
possibly dangerous, because it will modify the underlying file and
will affect other users of the file"
Yes, because it's a clear warning sign that something weird is
happening. You might be throwing away data that some other process might
be relying on.
How are you making QEMU consume hugetlbs?
A classical way to consume (not shared) hugetlbfs pages is done with the
creation of a file that is opened, mmapped by the Qemu instance but we
also delete the file system entry so that if the Qemu instance dies, the
resources are released. This file is usually not shared.
Right, see above. We should be using memory-backend-file,share=on with that,
just like we would with shmem/tmpfs :(
The ugly bit is that the legacy "-mem-path" option translates to
"memory-backend-file,share=off", and we cannot easily change that.
That option really should not be used anymore.
We could suppress these warnings, but let's first see how you are able
to trigger it.
The warning is always displayed when such a hugetlbfs VM impacted by a
memory error is rebooted.
I understand the reason why we have this message, but in the case of
hugetlbfs classical use this (new) message on reboot is probably too
worrying... But loosing memory is already very worrying ;)
See above; we cannot easily identify "we map this file MAP_PRIVATE
but we are guaranteed to be the single user", so punching a hole in that
file might just corrupt data for another user (e.g., VM templating) without
any warning.
Again, we could suppress the warning, but not using MAP_PRIVATE with
a hugetlb file would be even better.
(hugetlb contains other hacks that make sure that MAP_PRIVATE on a file
won't result in a double memory consumption -- with shmem/tmpfs it would
result in a double memory consumption!)
Are the users you are aware of using "-mem-path" or "-object memory-backend-file"?
We might be able to change the default for the latter with a new QEMU version,
maybe ...
--
Cheers,
David / dhildenb