Re: Nested VMX - L1 hangs on running L2

Bandan Das <bandan.das@xxxxxxxxxxx> · Wed, 20 Jul 2011 16:42:29 -0400

On  0, Nadav Har'El <NYH@xxxxxxxxxx> wrote:
> > No, both patches are wrong.
> 
> Guys, thanks for looking into this bug. I'm afraid I'm still at a loss at
> why a TSC bug would even cause a guest lockup :(
> 
> When Avi Kivity saw my nested TSC handling code he remarked "this is
> probably wrong". When I asked him where it was wrong, he basically said
> that he didn't know where, but TSC handling code is always wrong ;-)
> And it turns out he was right.
> 
> > The correct fix is to make kvm_get_msr() return the L1 guest TSC at all
> times.
> >  We are serving the L1 guest in this hypervisor, not the L2 guest, and so
> 
> > should never read its TSC for any reason.
> ...
> > allows the L2 guest to overwrite the L1 guest TSC, which at first seems
> wrong,
> > but is in fact the correct virtualization of a security hole in the L1
> guest.
> 
> I think I'm beginning to see the error in my ways...
> 
> When L1 lets L2 (using the MSR bitmap) direct read/write access to the TSC,
> it doesn't want L0 to be "clever" and give L2 its own separate TSC (like
> I do now), but rather gives it full control over L1's TSC - so reading or
> writing it should actually return L1's TSC, and the TSC_OFFSET in vmcs12
> is to be ignored.
> 
> So basically, if I understand correctly, what I need to change is
> in prepare_vmcs02(), if the MSR_IA32_TSC is on the MSR bitmap (read?
> write?), instead of doing
>         vmcs_write64(TSC_OFFSET,
>                 vmx->nested.vmcs01_tsc_offset + vmcs12->tsc_offset);
> I just need to do
>         vmcs_write64(TSC_OFFSET,
>                 vmx->nested.vmcs01_tsc_offset);
> thereby giving L2 exactly the same TSC that L1 had.
> Brandan, if I remember correctly you once tried this sort of fix and
> it actually worked?
That is correct. That is still my "workaround fix" that I have been using
on my systems. But as you have mentioned above (and below), I am still struggling
with two questions :

1. Why does L1 hang even if the TSC has wrong values.
2. I see this on a Dell  R610 and I don't know why you and some others don't see this.
I assumed from the symptoms that this should be fairly easy to reproduce on any system.

Bandan 
> Then, guest_read_tsc() will return (without need to change this code)
> the correct L1 TSC.
> 
> And vmx_write_tsc_offset() should do in the is_guest_mode() not what
> it does now (vmcs12->tsc_offset is of no important when the TSC MSR
> is passed through) but rather set vmcs01_tsc_offset (which will be
> applied on the next exit to L1).
> 
> Is my analysis correct? Or perhaps completely wrong? ;-)
> Am I missing anything else that should be change?
> 
> In any case, I don't understand why on my machine I never encountered
> these problems, and nothing broke even if I replaced the TSC nesting
> code with randomly broken code. Are the people who are seeing this
> brakage actually passed the MSR from L1 to L2 - using the MSR bitmap -
> like I guessed above? Or am I missing something completely different?
> 
> Sorry, but I'm really becoming confused by these TSC issues...
> 
> Thanks,
> Nadav.
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html