On Tue, 18 Feb 2020 at 22:54, Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote: > > Wanpeng Li <kernellwp@xxxxxxxxx> writes: > > > From: Wanpeng Li <wanpengli@xxxxxxxxxxx> > > > > In the progress of vCPUs creation, it queues a kvmclock sync worker to the global > > workqueue before each vCPU creation completes. Each worker will be scheduled > > after 300 * HZ delay and request a kvmclock update for all vCPUs and kick them > > out. This is especially worse when scaling to large VMs due to a lot of vmexits. > > Just one worker as a leader to trigger the kvmclock sync request for all vCPUs is > > enough. > > > > Signed-off-by: Wanpeng Li <wanpengli@xxxxxxxxxxx> > > --- > > v3 -> v4: > > * check vcpu->vcpu_idx > > > > arch/x86/kvm/x86.c | 5 +++-- > > 1 file changed, 3 insertions(+), 2 deletions(-) > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index fb5d64e..d0ba2d4 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -9390,8 +9390,9 @@ void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu) > > if (!kvmclock_periodic_sync) > > return; > > > > - schedule_delayed_work(&kvm->arch.kvmclock_sync_work, > > - KVMCLOCK_SYNC_PERIOD); > > + if (vcpu->vcpu_idx == 0) > > + schedule_delayed_work(&kvm->arch.kvmclock_sync_work, > > + KVMCLOCK_SYNC_PERIOD); > > } > > > > void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) > > Forgive me my ignorance, I was under the impression > schedule_delayed_work() doesn't do anything if the work is already > queued (see queue_delayed_work_on()) and we seem to be scheduling the > same work (&kvm->arch.kvmclock_sync_work) which is per-kvm (not > per-vcpu). Do we actually happen to finish executing it before next vCPU > is created or why does the storm you describe happens? I miss it, ok, let's just make patch 2/2 upstream. Wanpeng