2015-04-17 21:23-0300, Marcelo Tosatti: > > From: Radim Krčmář <rkrcmar@xxxxxxxxxx> > > As noted by Andy Lutomirski, kvm does not follow the documented version > protocol. Fix it. > > Note: this bug results in a race which can occur if the following three > conditions are met: > > 1) There is KVM guest time update (there is one every 5 minutes). > > 2) Which races with a thread in the guest in the following way: > The execution of these 29 instructions has to take at _least_ > 2 seconds (rebalance interval is 1 second). > > lsl %r9w,%esi > mov %esi,%r8d > and $0x3f,%esi > and $0xfff,%r8d > test $0xfc0,%r8d > jne 0xa12 <vread_pvclock+210> > shl $0x6,%rsi > mov -0xa01000(%rsi),%r10d > data32 xchg %ax,%ax > data32 xchg %ax,%ax > rdtsc > shl $0x20,%rdx > mov %eax,%eax > movsbl -0xa00fe4(%rsi),%ecx > or %rax,%rdx > sub -0xa00ff8(%rsi),%rdx > mov -0xa00fe8(%rsi),%r11d > mov %rdx,%rax > shl %cl,%rax > test %ecx,%ecx > js 0xa08 <vread_pvclock+200> > mov %r11d,%edx > movzbl -0xa00fe3(%rsi),%ecx > mov -0xa00ff0(%rsi),%r11 > mul %rdx > shrd $0x20,%rdx,%rax > data32 xchg %ax,%ax > data32 xchg %ax,%ax > lsl %r9w,%edx > > 3) Scheduler moves the task, while executing these 29 instructions, to a > destination processor, then back to the source processor. > > 4) Source processor, after has been moved back from destination, > perceives data out of order as written by processor performing guest > time update (item 1), with string mov. > > Given the rarity of this condition, and the fact it was never observed > or reported, reverting pvclock vsyscall on systems whose host is > susceptible to the race, seems an excessive measure. > > Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx> > Cc: stable@xxxxxxxxxx Thanks. Reviewed-or-Signed-off-by: Radim Krčmář <rkrcmar@xxxxxxxxxx> Like most code, I would have written it differently now :) > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > + kvm_write_guest_cached(v->kvm, &vcpu->pv_time, > + &guest_hv_clock, > + sizeof(guest_hv_clock)); The easiest optimization is replacing sizeof(guest_hv_clock) with offsetof(typeof(guest_hv_clock), version) + sizeof(guest_hv_clock.version) because kvm_write_guest_cached() allows writing of prefixes. This still won't get optimized to a simple MOV at compile time, but saves few mov bytes. (Offset of version is 0 now, so using 'sizeof guest_hv_clock.version' is just a minor offence sand saves some hard to read code.) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html