On Wed, Apr 12, 2017 at 02:23:15PM -0300, Marcelo Tosatti wrote: > > The disablement of interrupts at KVM_SET_CLOCK/KVM_GET_CLOCK > attempts to disable interrupts in that section to protect > the values that are calculated in that section from interrupt interference. > > now_ns is calculated inside the irq protected region, > user_ns.clock is passed from userspace (therefore not susceptible > to interrupt variation). > > About the line > now_ns = __get_kvmclock_ns(kvm); (1) > > Interrupts can happen afterwards local_irq_enable(), > rendering "now_ns" relative to its execution time PLUS > interrupt time. > > Therefore the local_irq_disable() / local_irq_enable() protection is not > necessary (that is: interrupts triggering after local_irq_enable cause > the same problem that the protection is trying to avoid). > > With this reasoning, and the -RT bug that the irq disablement causes > (because spin_lock is now a sleeping lock), remove the IRQ protection as it causes: > > [ 1064.668109] in_atomic(): 0, irqs_disabled(): 1, pid: 15296, name:m > [ 1064.668110] INFO: lockdep is turned off. > [ 1064.668110] irq event stamp: 0 > [ 1064.668112] hardirqs last enabled at (0): [< (null)>] ) > [ 1064.668116] hardirqs last disabled at (0): [<ffffffff9308184a>] c0 > [ 1064.668118] softirqs last enabled at (0): [<ffffffff9308184a>] c0 > [ 1064.668118] softirqs last disabled at (0): [< (null)>] ) > [ 1064.668121] CPU: 13 PID: 15296 Comm: qemu-kvm Not tainted 3.10.0-1 > [ 1064.668121] Hardware name: Dell Inc. PowerEdge R730/0H21J3, BIOS 5 > [ 1064.668123] ffff8c1796b88000 00000000afe7344c ffff8c179abf3c68 f3 > [ 1064.668125] ffff8c179abf3c90 ffffffff930ccb3d ffff8c1b992b3610 f0 > [ 1064.668126] 00007ffc1a26fbc0 ffff8c179abf3cb0 ffffffff9375f694 f0 > [ 1064.668126] Call Trace: > [ 1064.668132] [<ffffffff93757413>] dump_stack+0x19/0x1b > [ 1064.668135] [<ffffffff930ccb3d>] __might_sleep+0x12d/0x1f0 > [ 1064.668138] [<ffffffff9375f694>] rt_spin_lock+0x24/0x60 > [ 1064.668155] [<ffffffffc06ab996>] __get_kvmclock_ns+0x36/0x110 [k] > [ 1064.668159] [<ffffffff93112993>] ? futex_wait_queue_me+0x103/0x10 > [ 1064.668171] [<ffffffffc06b8782>] kvm_arch_vm_ioctl+0xa2/0xd70 [k] > [ 1064.668173] [<ffffffff9311333c>] ? futex_wait+0x1ac/0x2a0 > > On -RT kernels. Hmm, __get_kvmclock_ns used not to need a spinlock back when it was added... Why does it now? Looking at its current state, I'm not sure I understand what it's supposed to do: it uses the host tsc rate rather than the guest one, which seems to just defeat the purpose of originally introducing it: to have a way to obtain the clock value exactly the same as the guest would see... Am I missing anything obvious? Thanks, Roman.