For shared memory we really need it.
Private file-backed is weird ... because we don't know if the shared or
the private page is problematic ... :(
I agree with you, and we have to decide when should we bail out if
ram_block_discard_range() doesn't work.
According to me, if discard doesn't work and we are dealing with
file-backed largepages (shared or not) we have to exit, because the
fallocate is mandatory. It is the case with hugetlbfs.
> > In the non-file-backed case, or the file-backed non-largepage private
case, according to me we can trust the mmap() method to put everything
back in place for the VM reset to work as expected.
Are there aspects I don't see, and for which mmap + the remap handler is
not sufficient and we should also bail out here ?
mmap() will only zap anonymous pages, no pagecache pages. See below.
Maybe we should just do:
if (block->fd >= 0) {
/* mmap(MAP_FIXED) cannot reliably zap our problematic page. */
error_report(...);
exit(-1);
}
Or alternatively
if (block->fd >= 0 && qemu_ram_is_shared(block)) {
/* mmap() cannot possibly zap our problematic page. */
error_report(...);
exit(-1);
} else if (block->fd >= 0) {
/*
* MAP_PRIVATE file-backed ... mmap() can only zap the private
* page, not the shared one ... we don't know which one is
* problematic.
*/
warn_report(...);
}
I also agree that any file-backed/shared case should bail out if discard
(fallocate) fails, no mater large or standard pages are used.
In the case of file-backed private standard pages, I think that a poison
on the private page can be fixed with a new mmap.
According to me, there are 2 cases to consider: at the moment the poison
is seen, the page was dirty (so it means that it was a pure private
page), or the page was not dirty, and in this case the poison could
replace this non-dirty page with a new copy of the file content.
In both cases, I'd say that the remap should clean up the poison.
Let's assume we have mmap(MAP_RIVATE, fd). The following scenarios are
possible:
(a) We only have a pagecache page (never written) that is poisoned
-> mmap(MAP_FIXED) cannot resolve that
(b) We only have an anonymous page (e.g., pagecache truncated, or if the
hugetlb file was empty) that is poisoned
-> mmap(MAP_FIXED) can resolve that
(c) We have an anonymous and a pagecache page (written -> COW).
(c1) Anonymous page is poisoned -> mmap(MAP_FIXED) can resolve that
(c2) Pagecache page is poisoned -> mmap(MAP_FIXED) cannot resolve that
So mmap(MAP_FIXED) cannot sort out all cases. In practice, (a) and (c2)
are uncommon, but possible.
(b) is common with hugetlb. (a) and (c) are uncommon with hugetlb, just
because of the nature of hugetlb pages being a scarce resource.
And IIRC, (b) with hugetlb should should be sorted out with
mmap(MAP_FIXED). Please double-check.
So the conditions when discard fails, could be something like:
if (block->fd >= 0 && (qemu_ram_is_shared(block) ||
(length > TARGET_PAGE_SIZE))) {
/* punch hole is mandatory, mmap() cannot possibly zap our page*/
error_report("%spage recovery failure addr: "
RAM_ADDR_FMT "@" RAM_ADDR_FMT "",
(length > TARGET_PAGE_SIZE) ? "large " : "",
length, addr);
I'm not sure if we should be special-casing hugetlb.
If we want to be 100% sure, we will do
if (block->fd >= 0) {
error_report();
exit(1);
}
But we could decide to be "nice" to hugetlb and assume (b) for them
above: that is, we would do
/*
* mmap() cannot zap pagecache pages, only anonymous pages. As soon as
* we might have pagecache pages involved (either private or shared
* mapping), we must be careful. However, MAP_PRIVATE on empty hugetlb
* files is common, and extremely uncommon on non-empty hugetlb files,
* so we'll special-case them here.
*/
if (block->fd >= 0 && (qemu_ram_is_shared(block) ||
length == TARGET_PAGE_SIZE))) {
...
}
[in practice, we could use /proc/self/pagemap to see if we map an
anonymous page ... but I'd rather not go down that path just yet]
But, in the end the expectation is that madvise()+fallocate() will
usually not fail.
--
Cheers,
David / dhildenb