On 6/22/20 12:03 PM, Paolo Bonzini wrote: > On 22/06/20 18:33, Tom Lendacky wrote: >> I'm not a big fan of trapping #PF for this. Can't this have a performance >> impact on the guest? If I'm not mistaken, Qemu will default to TCG >> physical address size (40-bits), unless told otherwise, causing #PF to now >> be trapped. Maybe libvirt defaults to matching host/guest CPU MAXPHYADDR? > > Yes, this is true. We should change it similar to how we handle TSC > frequency (and having support for guest MAXPHYADDR < host MAXPHYADDR is > a prerequisite). > >> In bare-metal, there's no guarantee a CPU will report all the faults in a >> single PF error code. And because of race conditions, software can never >> rely on that behavior. Whenever the OS thinks it has cured an error, it >> must always be able to handle another #PF for the same access when it >> retries because another processor could have modified the PTE in the >> meantime. > > I agree, but I don't understand the relation to this patch. Can you > explain? I guess I'm trying to understand why RSVD has to be reported to the guest on a #PF (vs an NPF) when there's no guarantee that it can receive that error code today even when guest MAXPHYADDR == host MAXPHYADDR. That would eliminate the need to trap #PF. Thanks, Tom > >> What's the purpose of reporting RSVD in the error code in the >> guest in regards to live migration? >> >>> - if the page is accessible to the guest according to the permissions in >>> the page table, it will cause a #NPF. Again, we need to trap it, check >>> the guest physical address and inject a P|RSVD #PF if the guest physical >>> address has any guest-reserved bits. >>> >>> The AMD specific issue happens in the second case. By the time the NPF >>> vmexit occurs, the accessed and/or dirty bits have been set and this >>> should not have happened before the RSVD page fault that we want to >>> inject. On Intel processors, instead, EPT violations trigger before >>> accessed and dirty bits are set. I cannot find an explicit mention of >>> the intended behavior in either the >>> Intel SDM or the AMD APM. >> >> Section 15.25.6 of the AMD APM volume 2 talks about page faults (nested vs >> guest) and fault ordering. It does talk about setting guest A/D bits >> during the walk, before an #NPF is taken. I don't see any way around that >> given a virtual MAXPHYADDR in the guest being less than the host MAXPHYADDR. > > Right you are... Then this behavior cannot be implemented on AMD. > > Paolo >