On Mon, Dec 18, 2023 at 07:40:11PM -0800, Jim Mattson wrote: >On Mon, Dec 18, 2023 at 6:51 PM Chao Gao <chao.gao@xxxxxxxxx> wrote: >> >> On Mon, Dec 18, 2023 at 07:13:27AM -0800, Sean Christopherson wrote: >> >On Mon, Dec 18, 2023, Tao Su wrote: >> >> When host doesn't support 5-level EPT, bits 51:48 of the guest physical >> >> address must all be zero, otherwise an EPT violation always occurs and >> >> current handler can't resolve this if the gpa is in RAM region. Hence, >> >> instruction will keep being executed repeatedly, which causes infinite >> >> EPT violation. >> >> >> >> Six KVM selftests are timeout due to this issue: >> >> kvm:access_tracking_perf_test >> >> kvm:demand_paging_test >> >> kvm:dirty_log_test >> >> kvm:dirty_log_perf_test >> >> kvm:kvm_page_table_test >> >> kvm:memslot_modification_stress_test >> >> >> >> The above selftests add a RAM region close to max_gfn, if host has 52 >> >> physical bits but doesn't support 5-level EPT, these will trigger infinite >> >> EPT violation when access the RAM region. >> >> >> >> Since current Intel CPUID doesn't report max guest physical bits like AMD, >> >> introduce kvm_mmu_tdp_maxphyaddr() to limit guest physical bits when tdp is >> >> enabled and report the max guest physical bits which is smaller than host. >> >> >> >> When guest physical bits is smaller than host, some GPA are illegal from >> >> guest's perspective, but are still legal from hardware's perspective, >> >> which should be trapped to inject #PF. Current KVM already has a parameter >> >> allow_smaller_maxphyaddr to support the case when guest.MAXPHYADDR < >> >> host.MAXPHYADDR, which is disabled by default when EPT is enabled, user >> >> can enable it when loading kvm-intel module. When allow_smaller_maxphyaddr >> >> is enabled and guest accesses an illegal address from guest's perspective, >> >> KVM will utilize EPT violation and emulate the instruction to inject #PF >> >> and determine #PF error code. >> > >> >No, fix the selftests, it's not KVM's responsibility to advertise the correct >> >guest.MAXPHYADDR. >> >> In this case, host.MAXPHYADDR is 52 and EPT supports 4-level only thus can >> translate up to 48 bits of GPA. > >There are a number of issues that KVM does not handle when >guest.MAXPHYADDR < host.MAXPHYADDR. For starters, KVM doesn't raise a >#GP in PAE mode when one of the address bits above guest.MAXPHYADDR is >set in one of the PDPTRs. These are long-standing issues I believe. Note: current KVM ABI doesn't enforce guest.MAXPHYADDR = host.MAXPHYADDR regardless of "allow_smaller_maxphyaddr". > >Honestly, I think KVM should just disable EPT if the EPT tables can't >support the CPU's physical address width. Yes, it is an option. But I prefer to allow admin to override this (i.e., admin still can enable EPT via module parameter) because those issues are not new and disabling EPT doesn't prevent QEMU from launching guests w/ smaller MAXPHYADDR. > >> Here nothing visible to selftests or QEMU indicates that guest.MAXPHYADDR = 52 >> is invalid/incorrect. how can we say selftests are at fault and we should fix >> them? > >In this case, the CPU is at fault, and you should complain to the CPU vendor. Yeah, I agree with you and will check with related team inside Intel. My point was just this isn't a selftest issue because not all information is disclosed to the tests. And I am afraid KVM as L1 VMM may run into this situation, i.e., only 4-level EPT is supported but MAXPHYADDR is 52. So, KVM needs a fix anyway.