On 11/12/24 12:13, David Hildenbrand wrote:
On 07.11.24 11:21, “William Roche wrote:
From: William Roche <william.roche@xxxxxxxxxx>
When an entire large page is impacted by an error (hugetlbfs case),
report better the size and location of this large memory hole, so
give a warning message when this page is first hit:
Memory error: Loosing a large page (size: X) at QEMU addr Y and GUEST
addr Z
Hm, I wonder if we really want to special-case hugetlb here.
Why not make the warning independent of the underlying page size?
We already have a warning provided by Qemu (in kvm_arch_on_sigbus_vcpu()):
Guest MCE Memory Error at QEMU addr Y and GUEST addr Z of type
BUS_MCEERR_AR/_AO injected
The one I suggest is an additional message provided before the above
message.
Here is an example:
qemu-system-x86_64: warning: Memory error: Loosing a large page (size:
2097152) at QEMU addr 0x7fdd7d400000 and GUEST addr 0x11600000
qemu-system-x86_64: warning: Guest MCE Memory Error at QEMU addr
0x7fdd7d400000 and GUEST addr 0x11600000 of type BUS_MCEERR_AO injected
According to me, this large page case additional message will help to
better understand the probable sudden proliferation of memory errors
that can be reported by Qemu on the impacted range.
Not only will the machine administrator identify better that a single
memory error had this large impact, it can also help us to better
measure the impact of fixing the large page memory error support in the
field (in the future).
These are some reasons why I do think this large page specific message
can be useful.