On Thu, Apr 13, 2017 at 02:06:36AM -0400, Pankaj Gupta wrote: > > > > > The disablement of interrupts at KVM_SET_CLOCK/KVM_GET_CLOCK > > attempts to disable interrupts in that section to protect > > the values that are calculated in that section from interrupt interference. > > > > now_ns is calculated inside the irq protected region, > > user_ns.clock is passed from userspace (therefore not susceptible > > to interrupt variation). > > > > About the line > > now_ns = __get_kvmclock_ns(kvm); (1) > > > > Interrupts can happen afterwards local_irq_enable(), > > rendering "now_ns" relative to its execution time PLUS > > interrupt time. > > Are you saying interrupt disable code here is to avoid any interrupt > latency before we update time? > > > > Therefore the local_irq_disable() / local_irq_enable() protection is not > > necessary (that is: interrupts triggering after local_irq_enable cause > > the same problem that the protection is trying to avoid). > > I am not sure because this could be different use-case for RT and mainline. > RT can afford to preempt, but if we do will we get desired results? > > Maybe if we want this code path to be "irq safe", can we use 'raw_spin_lock' > in place of 'spin_lock'. commit 395c6b0a9d56fe7fdb7aeda12795d0eb02475d24 Author: Avi Kivity <avi@xxxxxxxxxx> Date: Mon Oct 4 12:55:49 2010 +0200 KVM: Disable interrupts around get_kernel_ns() get_kernel_ns() wants preemption disabled. It doesn't make a lot of sense during the get/set ioctls (no way to make them non-racy) but the callee wants it. Signed-off-by: Avi Kivity <avi@xxxxxxxxxx> And static inline u64 get_kernel_ns(void) { struct timespec ts; WARN_ON(preemptible()); ktime_get_ts(&ts); monotonic_to_bootbased(&ts); return timespec_to_ns(&ts); } Which was added by: Add a helper function to compute the kernel time and convert nanoseconds back to CPU specific cycles. Note that these must not be called in preemptible context, as that would mean the kernel could enter software suspend state, which would cause non-atomic operation. However this comment makes no sense: ktime_get_ts(&ts) returns a clock value relative to either before or after software suspend (either case is fine). > > With this reasoning, and the -RT bug that the irq disablement causes > > (because spin_lock is now a sleeping lock), remove the IRQ protection as it > > causes: > > > > [ 1064.668109] in_atomic(): 0, irqs_disabled(): 1, pid: 15296, name:m > > [ 1064.668110] INFO: lockdep is turned off. > > [ 1064.668110] irq event stamp: 0 > > [ 1064.668112] hardirqs last enabled at (0): [< (null)>] ) > > [ 1064.668116] hardirqs last disabled at (0): [] c0 > > [ 1064.668118] softirqs last enabled at (0): [] c0 > > [ 1064.668118] softirqs last disabled at (0): [< (null)>] ) > > [ 1064.668121] CPU: 13 PID: 15296 Comm: qemu-kvm Not tainted 3.10.0-1 > > [ 1064.668121] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 5 > > [ 1064.668123] ffff8c1796b88000 00000000afe7344c ffff8c179abf3c68 f3 > > [ 1064.668125] ffff8c179abf3c90 ffffffff930ccb3d ffff8c1b992b3610 f0 > > [ 1064.668126] 00007ffc1a26fbc0 ffff8c179abf3cb0 ffffffff9375f694 f0 > > [ 1064.668126] Call Trace: > > [ 1064.668132] [] dump_stack+0x19/0x1b > > [ 1064.668135] [] __might_sleep+0x12d/0x1f0 > > [ 1064.668138] [] rt_spin_lock+0x24/0x60 > > [ 1064.668155] [] __get_kvmclock_ns+0x36/0x110 [k] > > [ 1064.668159] [] ? futex_wait_queue_me+0x103/0x10 > > [ 1064.668171] [] kvm_arch_vm_ioctl+0xa2/0xd70 [k] > > [ 1064.668173] [] ? futex_wait+0x1ac/0x2a0 > > Even with this patch, we might get same error for: > > get_kvmclock_ns { > > local_irq_save(flags); > ns = __get_kvmclock_ns(kvm); > local_irq_restore(flags); > > > } Yes, should fix that as well. > > > > On -RT kernels. > > > > > > Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx> > > > > --- > > > > arch/x86/kvm/x86.c | 4 ---- > > 1 file changed, 4 deletions(-) > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index 7c39b8a..935143b 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -4062,10 +4062,8 @@ long kvm_arch_vm_ioctl(struct file *filp, > > goto out; > > > > r = 0; > > - local_irq_disable(); > > now_ns = __get_kvmclock_ns(kvm); > > kvm->arch.kvmclock_offset += user_ns.clock - now_ns; > > - local_irq_enable(); > > kvm_gen_update_masterclock(kvm); > > break; > > } > > @@ -4073,11 +4071,9 @@ long kvm_arch_vm_ioctl(struct file *filp, > > struct kvm_clock_data user_ns; > > u64 now_ns; > > > > - local_irq_disable(); > > now_ns = __get_kvmclock_ns(kvm); > > user_ns.clock = now_ns; > > user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0; > > - local_irq_enable(); > > memset(&user_ns.pad, 0, sizeof(user_ns.pad)); > > > > r = -EFAULT; > >