On 6/19/20 6:07 PM, Paolo Bonzini wrote: > On 19/06/20 23:52, Tom Lendacky wrote: >>> A more subtle issue is when the host MAXPHYADDR is larger than that >>> of the guest. Page faults caused by reserved bits on the guest won't >>> cause an EPT violation/NPF and hence we also check guest MAXPHYADDR >>> and add PFERR_RSVD_MASK error code to the page fault if needed. >> >> I'm probably missing something here, but I'm confused by this >> statement. Is this for a case where a page has been marked not >> present and the guest has also set what it believes are reserved >> bits? Then when the page is accessed, the guest sees a page fault >> without the error code for reserved bits? > > No, for non-present page there is no issue because there are no reserved > bits in that case. If the page is present and no reserved bits are set > according to the host, however, there are two cases to consider: > > - if the page is not accessible to the guest according to the > permissions in the page table, it will cause a #PF. We need to trap it > and change the error code into P|RSVD if the guest physical address has > any guest-reserved bits. I'm not a big fan of trapping #PF for this. Can't this have a performance impact on the guest? If I'm not mistaken, Qemu will default to TCG physical address size (40-bits), unless told otherwise, causing #PF to now be trapped. Maybe libvirt defaults to matching host/guest CPU MAXPHYADDR? In bare-metal, there's no guarantee a CPU will report all the faults in a single PF error code. And because of race conditions, software can never rely on that behavior. Whenever the OS thinks it has cured an error, it must always be able to handle another #PF for the same access when it retries because another processor could have modified the PTE in the meantime. What's the purpose of reporting RSVD in the error code in the guest in regards to live migration? > > - if the page is accessible to the guest according to the permissions in > the page table, it will cause a #NPF. Again, we need to trap it, check > the guest physical address and inject a P|RSVD #PF if the guest physical > address has any guest-reserved bits. > > The AMD specific issue happens in the second case. By the time the NPF > vmexit occurs, the accessed and/or dirty bits have been set and this > should not have happened before the RSVD page fault that we want to > inject. On Intel processors, instead, EPT violations trigger before > accessed and dirty bits are set. I cannot find an explicit mention of > the intended behavior in either the > Intel SDM or the AMD APM. Section 15.25.6 of the AMD APM volume 2 talks about page faults (nested vs guest) and fault ordering. It does talk about setting guest A/D bits during the walk, before an #NPF is taken. I don't see any way around that given a virtual MAXPHYADDR in the guest being less than the host MAXPHYADDR. Thanks, Tom > > Paolo >