On 7/1/2022 6:21 AM, Michael Roth wrote:
On Thu, Jun 30, 2022 at 12:14:13PM -0700, Vishal Annapurve wrote:
With transparent_hugepages=always setting I see issues with the
current implementation.
Scenario:
1) Guest accesses a gfn range 0x800-0xa00 as private
2) Guest calls mapgpa to convert the range 0x84d-0x86e as shared
3) Guest tries to access recently converted memory as shared for the first time
Guest VM shutdown is observed after step 3 -> Guest is unable to
proceed further since somehow code section is not as expected
Corresponding KVM trace logs after step 3:
VCPU-0-61883 [078] ..... 72276.115679: kvm_page_fault: address
84d000 error_code 4
VCPU-0-61883 [078] ..... 72276.127005: kvm_mmu_spte_requested: gfn
84d pfn 100b4a4d level 2
VCPU-0-61883 [078] ..... 72276.127008: kvm_tdp_mmu_spte_changed: as
id 0 gfn 800 level 2 old_spte 100b1b16827 new_spte 100b4a00ea7
VCPU-0-61883 [078] ..... 72276.127009: kvm_mmu_prepare_zap_page: sp
gen 0 gfn 800 l1 8-byte q0 direct wux nxe ad root 0 sync
VCPU-0-61883 [078] ..... 72276.127009: kvm_tdp_mmu_spte_changed: as
id 0 gfn 800 level 1 old_spte 1003eb27e67 new_spte 5a0
VCPU-0-61883 [078] ..... 72276.127010: kvm_tdp_mmu_spte_changed: as
id 0 gfn 801 level 1 old_spte 10056cc8e67 new_spte 5a0
VCPU-0-61883 [078] ..... 72276.127010: kvm_tdp_mmu_spte_changed: as
id 0 gfn 802 level 1 old_spte 10056fa2e67 new_spte 5a0
VCPU-0-61883 [078] ..... 72276.127010: kvm_tdp_mmu_spte_changed: as
id 0 gfn 803 level 1 old_spte 0 new_spte 5a0
....
VCPU-0-61883 [078] ..... 72276.127089: kvm_tdp_mmu_spte_changed: as
id 0 gfn 9ff level 1 old_spte 100a43f4e67 new_spte 5a0
VCPU-0-61883 [078] ..... 72276.127090: kvm_mmu_set_spte: gfn 800
spte 100b4a00ea7 (rwxu) level 2 at 10052fa5020
VCPU-0-61883 [078] ..... 72276.127091: kvm_fpu: unload
Looks like with transparent huge pages enabled kvm tried to handle the
shared memory fault on 0x84d gfn by coalescing nearby 4K pages
to form a contiguous 2MB page mapping at gfn 0x800, since level 2 was
requested in kvm_mmu_spte_requested.
This caused the private memory contents from regions 0x800-0x84c and
0x86e-0xa00 to get unmapped from the guest leading to guest vm
shutdown.
Interesting... seems like that wouldn't be an issue for non-UPM SEV, since
the private pages would still be mapped as part of that 2M mapping, and
it's completely up to the guest as to whether it wants to access as
private or shared. But for UPM it makes sense this would cause issues.
Does getting the mapping level as per the fault access type help
address the above issue? Any such coalescing should not cross between
private to
shared or shared to private memory regions.
Doesn't seem like changing the check to fault->is_private would help in
your particular case, since the subsequent host_pfn_mapping_level() call
only seems to limit the mapping level to whatever the mapping level is
for the HVA in the host page table.
Seems like with UPM we need some additional handling here that also
checks that the entire 2M HVA range is backed by non-private memory.
Non-UPM SNP hypervisor patches already have a similar hook added to
host_pfn_mapping_level() which implements such a check via RMP table, so
UPM might need something similar:
https://github.com/AMDESE/linux/commit/ae4475bc740eb0b9d031a76412b0117339794139
-Mike
For TDX, we try to track the page type (shared, private, mixed) of each
gfn at given level. Only when the type is shared/private, can it be
mapped at that level. When it's mixed, i.e., it contains both shared
pages and private pages at given level, it has to go to next smaller level.
https://github.com/intel/tdx/commit/ed97f4042eb69a210d9e972ccca6a84234028cad