Quoting Claudio Imbrenda (2023-11-02 16:35:49) > When the host invalidates a guest page, it will also check if the page > was used to map the prefix of any guest CPUs, in which case they are > stopped and marked as needing a prefix refresh. Upon starting the > affected CPUs again, their prefix pages are explicitly faulted in and > revalidated if they had been invalidated. A bit in the PGSTEs indicates > whether or not a page might contain a prefix. The bit is allowed to > overindicate. Pages above 2G are skipped, because they cannot be > prefixes, since KVM runs all guests with MSO = 0. > > The same applies for nested guests (VSIE). When the host invalidates a > guest page that maps the prefix of the nested guest, it has to stop the > affected nested guest CPUs and mark them as needing a prefix refresh. > The same PGSTE bit used for the guest prefix is also used for the > nested guest. Pages above 2G are skipped like for normal guests, which > is the source of the bug. > > The nested guest runs is the guest primary address space. The guest > could be running the nested guest using MSO != 0. If the MSO + prefix > for the nested guest is above 2G, the check for nested prefix will skip > it. This will cause the invalidation notifier to not stop the CPUs of > the nested guest and not mark them as needing refresh. When the nested > guest is run again, its prefix will not be refreshed, since it has not > been marked for refresh. This will cause a fatal validity intercept > with VIR code 37. > > Fix this by removing the check for 2G for nested guests. Now all > invalidations of pages with the notify bit set will always scan the > existing VSIE shadow state descriptors. > > This allows to catch invalidations of nested guest prefix mappings even > when the prefix is above 2G in the guest virtual address space. > > Signed-off-by: Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx> Tested-by: Nico Boehr <nrb@xxxxxxxxxxxxx> Reviewed-by: Nico Boehr <nrb@xxxxxxxxxxxxx>