On Fri, Jan 10, 2025 at 05:04:07PM -0800, Sean Christopherson wrote: > When resetting a dirty ring, conditionally reschedule on each iteration > after the first. The recently introduced hard limit mitigates the issue > of an endless reset, but isn't sufficient to completely prevent RCU > stalls, soft lockups, etc., nor is the hard limit intended to guard > against such badness. > > Note! Take care to check for reschedule even in the "continue" paths, > as a pathological scenario (or malicious userspace) could dirty the same > gfn over and over, i.e. always hit the continue path. > > rcu: INFO: rcu_sched self-detected stall on CPU > rcu: 4-....: (5249 ticks this GP) idle=51e4/1/0x4000000000000000 softirq=309/309 fqs=2563 > rcu: (t=5250 jiffies g=-319 q=608 ncpus=24) > CPU: 4 UID: 1000 PID: 1067 Comm: dirty_log_test Tainted: G L 6.13.0-rc3-17fa7a24ea1e-HEAD-vm #814 > Tainted: [L]=SOFTLOCKUP > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 > RIP: 0010:kvm_arch_mmu_enable_log_dirty_pt_masked+0x26/0x200 [kvm] > Call Trace: > <TASK> > kvm_reset_dirty_gfn.part.0+0xb4/0xe0 [kvm] > kvm_dirty_ring_reset+0x58/0x220 [kvm] > kvm_vm_ioctl+0x10eb/0x15d0 [kvm] > __x64_sys_ioctl+0x8b/0xb0 > do_syscall_64+0x5b/0x160 > entry_SYSCALL_64_after_hwframe+0x4b/0x53 > </TASK> > Tainted: [L]=SOFTLOCKUP > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 > RIP: 0010:kvm_arch_mmu_enable_log_dirty_pt_masked+0x17/0x200 [kvm] > Call Trace: > <TASK> > kvm_reset_dirty_gfn.part.0+0xb4/0xe0 [kvm] > kvm_dirty_ring_reset+0x58/0x220 [kvm] > kvm_vm_ioctl+0x10eb/0x15d0 [kvm] > __x64_sys_ioctl+0x8b/0xb0 > do_syscall_64+0x5b/0x160 > entry_SYSCALL_64_after_hwframe+0x4b/0x53 > </TASK> > > Fixes: fb04a1eddb1a ("KVM: X86: Implement ring-based dirty memory tracking") > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > --- > virt/kvm/dirty_ring.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c > index a81ad17d5eef..37eb2b7142bd 100644 > --- a/virt/kvm/dirty_ring.c > +++ b/virt/kvm/dirty_ring.c > @@ -133,6 +133,16 @@ int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring, > > ring->reset_index++; > (*nr_entries_reset)++; > + > + /* > + * While the size of each ring is fixed, it's possible for the > + * ring to be constantly re-dirtied/harvested while the reset > + * is in-progress (the hard limit exists only to guard against > + * wrapping the count into negative space). > + */ > + if (!first_round) > + cond_resched(); > + Will cond_resched() per entry be too frequent? Could we combine the cond_resched() per ring? e.g. if (count >= ring->soft_limit) cond_resched(); or simply while (count < ring->size) { ... } > /* > * Try to coalesce the reset operations when the guest is > * scanning pages in the same slot. > -- > 2.47.1.613.gc27f4b7a9f-goog >