Re: [PATCH v3 04/22] KVM: x86: Set vCPU exit reason to KVM_EXIT_UNKNOWN at the start of KVM_RUN

Sean Christopherson <seanjc@xxxxxxxxxx> · Tue, 2 May 2023 11:51:44 -0700

On Tue, May 02, 2023, Anish Moorthy wrote:
> During some testing yesterday I realized that this patch actually
> breaks the self test, causing an error which the later self test
> changes cover up.
> 
> Running "./demand_paging_test -b 512M -u MINOR -s shmem -v 1" from
> kvm/next (b3c98052d469) with just this patch applies gives the
> following output
> 
> > # ./demand_paging_test -b 512M -u MINOR -s shmem -v 1
> > Testing guest mode: PA-bits:ANY, VA-bits:48,  4K pages
> > guest physical test memory: [0x7fcdfffe000, 0x7fcffffe000)
> > Finished creating vCPUs and starting uffd threads
> > Started all vCPUs
> > ==== Test Assertion Failure ====
> >  demand_paging_test.c:50: false
> >  pid=13293 tid=13297 errno=4 - Interrupted system call
> >  // Some stack trace stuff
> >  Invalid guest sync status: exit_reason=UNKNOWN, ucall=0
> 
> The problem is the get_ucall() part of the following block in the self
> test's vcpu_worker()
> 
> > ret = _vcpu_run(vcpu);
> > TEST_ASSERT(ret == 0, "vcpu_run failed: %d\n", ret);
> > if (get_ucall(vcpu, NULL) != UCALL_SYNC) {
> >    TEST_ASSERT(false,
> >                               "Invalid guest sync status: exit_reason=%s\n",
> >                               exit_reason_str(run->exit_reason));
> > }
> 
> I took a look and, while get_ucall() does depend on the value of
> exit_reason, the error's root cause isn't clear to me yet.

Stating what you likely already know... On x86, the UCALL is performed via port
I/O, and so the selftests framework zeros out the ucall struct if the userspace
exit reason isn't KVM_EXIT_IO.

> Moving the "exit_reason = kvm_exit_unknown" line to later in the
> function, right above the vcpu_run() call "fixes" the problem. I've
> done that for now and will bisect later to investigate: if anyone
> has any clues please let me know.

Clobbering vcpu->run->exit_reason before this code block is a bug:

	if (unlikely(vcpu->arch.complete_userspace_io)) {
		int (*cui)(struct kvm_vcpu *) = vcpu->arch.complete_userspace_io;
		vcpu->arch.complete_userspace_io = NULL;
		r = cui(vcpu);
		if (r <= 0)
			goto out;
	} else {
		WARN_ON_ONCE(vcpu->arch.pio.count);
		WARN_ON_ONCE(vcpu->mmio_needed);
	}

	if (kvm_run->immediate_exit) {
		r = -EINTR;
		goto out;
	}

For userspace I/O and MMIO, KVM requires userspace to "complete" the instruction
that triggered the exit to userspace, e.g. write memory/registers and skip the
instruction as needed.  The immediate_exit flag is set by userspace when userspace
wants to retain control and is doing KVM_RUN purely to placate KVM.  In selftests,
this is done by vcpu_run_complete_io().

The one part I'm a bit surprised by is that this caused ucall problems.  The ucall
framework invokes vcpu_run_complete_io() _after_ it grabs the information. 

	addr = ucall_arch_get_ucall(vcpu);
	if (addr) {
		TEST_ASSERT(addr != (void *)GUEST_UCALL_FAILED,
			    "Guest failed to allocate ucall struct");

		memcpy(uc, addr, sizeof(*uc));
		vcpu_run_complete_io(vcpu);
	} else {
		memset(uc, 0, sizeof(*uc));
	}

Making multiple calls to get_ucall() after a single guest ucall would explain
everything as only the first get_ucall() would succeed, but AFAICT the test doesn't
invoke get_ucall() multiple times.

Aha!  Found it.  _vcpu_run() invokes assert_on_unhandled_exception(), which does

	if (get_ucall(vcpu, &uc) == UCALL_UNHANDLED) {
		uint64_t vector = uc.args[0];

		TEST_FAIL("Unexpected vectored event in guest (vector:0x%lx)",
			  vector);
	}

and thus triggers vcpu_run_complete_io() before demand_paging_test's vcpu_worker()
gets control and does _its_ get_ucall().