On 22/06/20 18:33, Tom Lendacky wrote: > I'm not a big fan of trapping #PF for this. Can't this have a performance > impact on the guest? If I'm not mistaken, Qemu will default to TCG > physical address size (40-bits), unless told otherwise, causing #PF to now > be trapped. Maybe libvirt defaults to matching host/guest CPU MAXPHYADDR? Yes, this is true. We should change it similar to how we handle TSC frequency (and having support for guest MAXPHYADDR < host MAXPHYADDR is a prerequisite). > In bare-metal, there's no guarantee a CPU will report all the faults in a > single PF error code. And because of race conditions, software can never > rely on that behavior. Whenever the OS thinks it has cured an error, it > must always be able to handle another #PF for the same access when it > retries because another processor could have modified the PTE in the > meantime. I agree, but I don't understand the relation to this patch. Can you explain? > What's the purpose of reporting RSVD in the error code in the > guest in regards to live migration? > >> - if the page is accessible to the guest according to the permissions in >> the page table, it will cause a #NPF. Again, we need to trap it, check >> the guest physical address and inject a P|RSVD #PF if the guest physical >> address has any guest-reserved bits. >> >> The AMD specific issue happens in the second case. By the time the NPF >> vmexit occurs, the accessed and/or dirty bits have been set and this >> should not have happened before the RSVD page fault that we want to >> inject. On Intel processors, instead, EPT violations trigger before >> accessed and dirty bits are set. I cannot find an explicit mention of >> the intended behavior in either the >> Intel SDM or the AMD APM. > > Section 15.25.6 of the AMD APM volume 2 talks about page faults (nested vs > guest) and fault ordering. It does talk about setting guest A/D bits > during the walk, before an #NPF is taken. I don't see any way around that > given a virtual MAXPHYADDR in the guest being less than the host MAXPHYADDR. Right you are... Then this behavior cannot be implemented on AMD. Paolo