Re: [PATCH 0/2] sync_regs() TOCTOU issues

Sean Christopherson <seanjc@xxxxxxxxxx> · Tue, 15 Aug 2023 11:15:39 -0700

On Tue, Aug 15, 2023, Michal Luczaj wrote:
> On 8/15/23 17:40, Sean Christopherson wrote:
> > On Tue, Aug 15, 2023, Michal Luczaj wrote:
> >>> @@ -115,6 +116,7 @@ static void *race_events_exc(void *arg)
> >>>  	for (;;) {
> >>>  		WRITE_ONCE(run->kvm_dirty_regs, KVM_SYNC_X86_EVENTS);
> >>>  		WRITE_ONCE(events->flags, 0);
> >>> +		WRITE_ONCE(events->exception.nr, GP_VECTOR);
> >>>  		WRITE_ONCE(events->exception.pending, 1);
> >>>  		WRITE_ONCE(events->exception.nr, 255);
> >>
> >> Here you're setting events->exception.nr twice. Is it deliberate?
> > 
> > Heh, yes and no.  It's partly leftover from a brief attempt to gracefully eat the
> > fault in the guest.
> > 
> > However, unless there's magic I'm missing, race_events_exc() needs to set a "good"
> > vector in every iteration, otherwise only the first iteration will be able to hit
> > the "check good, consume bad" scenario.
> 
> I think I understand what you mean. I see things slightly different: because
> 
> 	if (events->flags & KVM_VCPUEVENT_VALID_PAYLOAD) {
> 		...
> 	} else {
> 		events->exception.pending = 0;
> 		events->exception_has_payload = 0;
> 	}
> 
> zeroes exception.pending on every iteration, even though exception.nr may
> already be > 31, KVM does not necessary return -EINVAL at
> 
> 	if ((events->exception.injected || events->exception.pending) &&
> 	    (events->exception.nr > 31 || events->exception.nr == NMI_VECTOR))
> 		return -EINVAL;
> 
> It would if the racer set exception.pending before this check, but if it does it
> after the check, then KVM goes
> 
> 	vcpu->arch.exception.pending = events->exception.pending;
> 	vcpu->arch.exception.vector = events->exception.nr;
> 
> which later triggers the WARN. That said, if I you think setting and re-setting
> exception.nr is more efficient (as in: racy), I'm all for it.

My goal isn't to make it easier to hit the *known* TOCTOU, it's to make the test
more valuable after that known bug has been fixed.  I.e. I don't want to rely on
KVM to update kvm_run (which was arguably a bug even if there weren't a TOCTOU
issue).  It's kinda silly, because realistically this test is likely only ever
going to find TOCTOU bugs, but so long as the test can consistently the known bug,
my preference is to make it as "generic" as possible from a coverage perspective.