On Sat, 25 Dec 2021 at 09:19, Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Thu, Dec 23, 2021, Paolo Bonzini wrote: > > On 12/22/21 16:18, David Woodhouse wrote: > > > > > n a mapped > > > > > shared info page. > > > > > > > > > > This can be used for routing MSI of passthrough devices to PIRQ event > > > > > channels in a Xen guest, and we can build on it for delivering IPIs and > > > > > timers directly from the kernel too. > > > > Queued patches 1-5, thanks. > > > > > > > > Paolo > > > Any word on when these will make it out to kvm.git? Did you find > > > something else I need to fix? > > > > I got a WARN when testing the queue branch, now I'm bisecting. > > Was it perhaps this WARN? > > WARNING: CPU: 5 PID: 2180 at arch/x86/kvm/../../../virt/kvm/kvm_main.c:3163 mark_page_dirty_in_slot+0x6b/0x80 [kvm] > Modules linked in: kvm_intel kvm irqbypass > CPU: 5 PID: 2180 Comm: hyperv_clock Not tainted 5.16.0-rc4+ #81 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 > RIP: 0010:mark_page_dirty_in_slot+0x6b/0x80 [kvm] > kvm_write_guest+0x117/0x120 [kvm] > kvm_hv_invalidate_tsc_page+0x99/0xf0 [kvm] > kvm_arch_vm_ioctl+0x20f/0xb60 [kvm] > kvm_vm_ioctl+0x711/0xd50 [kvm] > __x64_sys_ioctl+0x7f/0xb0 > do_syscall_64+0x38/0xc0 > entry_SYSCALL_64_after_hwframe+0x44/0xae > > If so, it was clearly introduced by commit 03c0304a86bc ("KVM: Warn if mark_page_dirty() > is called without an active vCPU"). But that change is 100% correct as it's trivial > to crash KVM without its protection. E.g. running hyper_clock with these mods > > diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c > index 2dd78c1db4b6..ae0d0490580a 100644 > --- a/tools/testing/selftests/kvm/lib/kvm_util.c > +++ b/tools/testing/selftests/kvm/lib/kvm_util.c > @@ -371,6 +371,8 @@ struct kvm_vm *vm_create_with_vcpus(enum vm_guest_mode mode, uint32_t nr_vcpus, > vm_create_irqchip(vm); > #endif > > + vm_enable_dirty_ring(vm, 0x10000); > + > for (i = 0; i < nr_vcpus; ++i) { > uint32_t vcpuid = vcpuids ? vcpuids[i] : i; > > diff --git a/tools/testing/selftests/kvm/x86_64/hyperv_clock.c b/tools/testing/selftests/kvm/x86_64/hyperv_clock.c > index e0b2bb1339b1..1032695e3901 100644 > --- a/tools/testing/selftests/kvm/x86_64/hyperv_clock.c > +++ b/tools/testing/selftests/kvm/x86_64/hyperv_clock.c > @@ -212,6 +212,8 @@ int main(void) > vm = vm_create_default(VCPU_ID, 0, guest_main); > run = vcpu_state(vm, VCPU_ID); > > + vm_mem_region_set_flags(vm, 0, KVM_MEM_LOG_DIRTY_PAGES); > + > vcpu_set_hv_cpuid(vm, VCPU_ID); > > tsc_page_gva = vm_vaddr_alloc_page(vm); > > Triggers a NULL pointer deref. > > BUG: kernel NULL pointer dereference, address: 0000000000000000 > #PF: supervisor read access in kernel mode > #PF: error_code(0x0000) - not-present page > PGD 12959a067 P4D 12959a067 PUD 129594067 PMD 0 > Oops: 0000 [#1] SMP > CPU: 5 PID: 1784 Comm: hyperv_clock Not tainted 5.16.0-rc4+ #82 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 > RIP: 0010:kvm_dirty_ring_get+0xe/0x30 [kvm] > mark_page_dirty_in_slot.part.0+0x30/0x60 [kvm] > kvm_write_guest+0x117/0x130 [kvm] > kvm_hv_invalidate_tsc_page+0x99/0xf0 [kvm] > kvm_arch_vm_ioctl+0x20f/0xb60 [kvm] > kvm_vm_ioctl+0x711/0xd50 [kvm] > __x64_sys_ioctl+0x7f/0xb0 > do_syscall_64+0x38/0xc0 > entry_SYSCALL_64_after_hwframe+0x44/0xae I saw this warning by running vanilla hyperv_clock w/o modifying against kvm/queue. > > Commit e880c6ea55b9 ("KVM: x86: hyper-v: Prevent using not-yet-updated TSC page > by secondary CPUs") is squarely to blame as it was added after dirty ring, though > in Vitaly's defense, David put it best: "That's a fairly awful bear trap". > > Assuming there's nothing special about vcpu0's clock, I belive the below fixes > all issues. If not, then the correct fix is to have vcpu0 block all other vCPUs > until the update completes. Could we invalidate hv->tsc_ref.tsc_sequence directly w/o marking page dirty since after migration it is stale until the first successful kvm_hv_setup_tsc_page() call? Wanpeng