On Fri, Jul 01, 2022, Xiaoyao Li wrote: > On 7/1/2022 6:21 AM, Michael Roth wrote: > > On Thu, Jun 30, 2022 at 12:14:13PM -0700, Vishal Annapurve wrote: > > > With transparent_hugepages=always setting I see issues with the > > > current implementation. ... > > > Looks like with transparent huge pages enabled kvm tried to handle the > > > shared memory fault on 0x84d gfn by coalescing nearby 4K pages > > > to form a contiguous 2MB page mapping at gfn 0x800, since level 2 was > > > requested in kvm_mmu_spte_requested. > > > This caused the private memory contents from regions 0x800-0x84c and > > > 0x86e-0xa00 to get unmapped from the guest leading to guest vm > > > shutdown. > > > > Interesting... seems like that wouldn't be an issue for non-UPM SEV, since > > the private pages would still be mapped as part of that 2M mapping, and > > it's completely up to the guest as to whether it wants to access as > > private or shared. But for UPM it makes sense this would cause issues. > > > > > > > > Does getting the mapping level as per the fault access type help > > > address the above issue? Any such coalescing should not cross between > > > private to > > > shared or shared to private memory regions. > > > > Doesn't seem like changing the check to fault->is_private would help in > > your particular case, since the subsequent host_pfn_mapping_level() call > > only seems to limit the mapping level to whatever the mapping level is > > for the HVA in the host page table. > > > > Seems like with UPM we need some additional handling here that also > > checks that the entire 2M HVA range is backed by non-private memory. > > > > Non-UPM SNP hypervisor patches already have a similar hook added to > > host_pfn_mapping_level() which implements such a check via RMP table, so > > UPM might need something similar: > > > > https://github.com/AMDESE/linux/commit/ae4475bc740eb0b9d031a76412b0117339794139 > > > > -Mike > > > > For TDX, we try to track the page type (shared, private, mixed) of each gfn > at given level. Only when the type is shared/private, can it be mapped at > that level. When it's mixed, i.e., it contains both shared pages and private > pages at given level, it has to go to next smaller level. > > https://github.com/intel/tdx/commit/ed97f4042eb69a210d9e972ccca6a84234028cad Hmm, so a new slot->arch.page_attr array shouldn't be necessary, KVM can instead update slot->arch.lpage_info on shared<->private conversions. Detecting whether a given range is partially mapped could get nasty if KVM defers tracking to the backing store, but if KVM itself does the tracking as was previously suggested[*], then updating lpage_info should be relatively straightfoward, e.g. use xa_for_each_range() to see if a given 2mb/1gb range is completely covered (fully shared) or not covered at all (fully private). [*] https://lore.kernel.org/all/YofeZps9YXgtP3f1@xxxxxxxxxx