Hi Sebastian, On 16/08/2019 16:23, Sebastian Andrzej Siewior wrote: > On 2019-08-16 16:18:20 [+0100], Julien Grall wrote: >> Sadly, I managed to hit the same BUG_ON() today with this patch >> applied on top v5.2-rt1-rebase. :/ Although, it is more difficult >> to hit than previously. >> >> [ 157.449545] 000: BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:968 >> [ 157.449569] 000: in_atomic(): 1, irqs_disabled(): 0, pid: 990, name: kvm-vcpu-1 >> [ 157.449579] 000: 2 locks held by kvm-vcpu-1/990: >> [ 157.449592] 000: #0: 00000000c2fc8217 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0x70/0xae0 >> [ 157.449638] 000: #1: 0000000096863801 (&cpu_base->softirq_expiry_lock){+.+.}, at: hrtimer_grab_expiry_lock+0x24/0x40 >> [ 157.449677] 000: Preemption disabled at: >> [ 157.449679] 000: [<ffff0000111a4538>] schedule+0x30/0xd8 >> [ 157.449702] 000: CPU: 0 PID: 990 Comm: kvm-vcpu-1 Tainted: G W 5.2.0-rt1-00001-gd368139e892f #104 >> [ 157.449712] 000: Hardware name: ARM LTD ARM Juno Development Platform/ARM Juno Development Platform, BIOS EDK II Jan 23 2017 >> [ 157.449718] 000: Call trace: >> [ 157.449722] 000: dump_backtrace+0x0/0x130 >> [ 157.449730] 000: show_stack+0x14/0x20 >> [ 157.449738] 000: dump_stack+0xbc/0x104 >> [ 157.449747] 000: ___might_sleep+0x198/0x238 >> [ 157.449756] 000: rt_spin_lock+0x5c/0x70 >> [ 157.449765] 000: hrtimer_grab_expiry_lock+0x24/0x40 >> [ 157.449773] 000: hrtimer_cancel+0x1c/0x38 >> [ 157.449780] 000: kvm_timer_vcpu_load+0x78/0x3e0 > > … >> I will do some debug and see what I can find. > > which timer is this? Is there another one? It looks like the timer is the background timer (bg_timer). Although, the BUG() seems to happen with the other ones but less often. All of them have already been converted. Interestingly, hrtimer_grab_expiry_lock may be called by timer even if is_soft (I assume this means softirq will not be used) is 0. diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 7d7db8802131..fe05e553dea2 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -934,6 +934,9 @@ void hrtimer_grab_expiry_lock(const struct hrtimer *timer) { struct hrtimer_clock_base *base = timer->base; + WARN(!preemptible(), "is_soft %u base %p base->cpu_base %p\n", + timer->is_soft, base, base ? base->cpu_base : NULL); + if (base && base->cpu_base) { spin_lock(&base->cpu_base->softirq_expiry_lock); spin_unlock(&base->cpu_base->softirq_expiry_lock); [ 576.291886] 004: is_soft 0 base ffff80097eed44c0 base->cpu_base ffff80097eed4380 Because the hrtimer is started when scheduling out the vCPU and canceled when the scheduling in, there is no guarantee the hrtimer will be running on the same pCPU. So I think the following can happen: CPU0 | CPU1 | | hrtimer_interrupt() | raw_spin_lock_irqsave(&cpu_save->lock) hrtimer_cancel() | __run_hrtimer_run_queues() hrtimer_try_to_cancel() | __run_hrtimer() lock_hrtimer_base() | base->running = timer; | raw_spin_unlock_irqrestore(&cpu_save->lock) raw_spin_lock_irqsave(cpu_base->lock) | fn(timer); hrtimer_callback_running() | hrtimer_callback_running() will be returning true as the callback is running somewhere else. This means hrtimer_try_to_cancel() would return -1. Therefore hrtimer_grab_expiry_lock() would be called. Did I miss anything? Cheers, -- Julien Grall _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm