Re: [PATCH v6 3/8] KVM: Add support for using dirty ring in conjunction with bitmap

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 25 Oct 2022 18:47:12 +0100,
Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> 
> On Tue, Oct 25, 2022, Marc Zyngier wrote:
> > On Tue, 25 Oct 2022 01:24:19 +0100, Oliver Upton <oliver.upton@xxxxxxxxx> wrote:
> > > > That's why I asked if it's possible for KVM to require a dirty_bitmap when KVM
> > > > might end up collecting dirty information without a vCPU.  KVM is still
> > > > technically prescribing a solution to userspace, but only because there's only
> > > > one solution.
> > > 
> > > I was trying to allude to something like this by flat-out requiring
> > > ring + bitmap on arm64.
> > 
> > And I claim that this is wrong. It may suit a particular use case, but
> > that's definitely not a universal truth.
> 
> Agreed, KVM should not unconditionally require a dirty bitmap for arm64.
> 
> > > Otherwise, we'd either need to:
> > > 
> > >  (1) Document the features that explicitly depend on ring + bitmap (i.e.
> > >  GIC ITS, whatever else may come) such that userspace sets up the
> > >  correct configuration based on what its using. The combined likelihood
> > >  of both KVM and userspace getting this right seems low.
> > 
> > But what is there to get wrong? Absolutely nothing.
> 
> I strongly disagree.  On x86, we've had two bugs escape where KVM
> attempted to mark a page dirty without an active vCPU.
> 
>   2efd61a608b0 ("KVM: Warn if mark_page_dirty() is called without an active vCPU") 
>   42dcbe7d8bac ("KVM: x86: hyper-v: Avoid writing to TSC page without an active vCPU")
> 
> Call us incompetent, but I have zero confidence that KVM will never
> unintentionally add a path that invokes mark_page_dirty_in_slot()
> without a running vCPU.

Well, maybe it is time that KVM acknowledges there is a purpose to
dirtying memory outside of a vcpu context, and that if a write happens
in a vcpu context, this vcpu must be explicitly passed down rather
than obtained from kvm_get_running_vcpu(). Yes, this requires some
heavy surgery.

> By completely dropping the rule that KVM must have an active vCPU on
> architectures that support ring+bitmap, those types of bugs will go
> silently unnoticed, and will manifest as guest data corruption after
> live migration.

The elephant in the room is still userspace writing to its view of the
guest memory for device emulation. Do they get it right? I doubt it.

> And ideally such bugs would detected without relying on userspace to
> enabling dirty logging, e.g. the Hyper-V bug lurked for quite some
> time and was only found when mark_page_dirty_in_slot() started
> WARNing.
> 
> I'm ok if arm64 wants to let userspace shoot itself in the foot with
> the ITS, but I'm not ok dropping the protections in the common
> mark_page_dirty_in_slot().
> 
> One somewhat gross idea would be to let architectures override the
> "there must be a running vCPU" rule, e.g. arm64 could toggle a flag
> in kvm->arch in its kvm_write_guest_lock() to note that an expected
> write without a vCPU is in-progress:
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 8c5c69ba47a7..d1da8914f749 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -3297,7 +3297,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>         struct kvm_vcpu *vcpu = kvm_get_running_vcpu();
>  
>  #ifdef CONFIG_HAVE_KVM_DIRTY_RING
> -       if (WARN_ON_ONCE(!vcpu) || WARN_ON_ONCE(vcpu->kvm != kvm))
> +       if (!kvm_arch_allow_write_without_running_vcpu(kvm) && WARN_ON_ONCE(!vcpu))
> +               return;
> +
> +       if (WARN_ON_ONCE(vcpu && vcpu->kvm != kvm))
>                 return;
>  #endif
>  
> @@ -3305,10 +3308,10 @@ void mark_page_dirty_in_slot(struct kvm *kvm,
>                 unsigned long rel_gfn = gfn - memslot->base_gfn;
>                 u32 slot = (memslot->as_id << 16) | memslot->id;
>  
> -               if (kvm->dirty_ring_size)
> +               if (kvm->dirty_ring_size && vcpu)
>                         kvm_dirty_ring_push(&vcpu->dirty_ring,
>                                             slot, rel_gfn);
> -               else
> +               else if (memslot->dirty_bitmap)
>                         set_bit_le(rel_gfn, memslot->dirty_bitmap);
>         }
>  }

I think this is equally wrong. Writes occur from both CPUs and devices
*concurrently*, and I don't see why KVM should keep ignoring this
pretty obvious fact.

Yes, your patch papers over the problem, and it can probably work if
the kvm->arch flag only gets set in the ITS saving code, which is
already exclusive of vcpus running.

But in the long run, with dirty bits being collected from the IOMMU
page tables or directly from devices, we will need a way to reconcile
the dirty tracking. The above doesn't quite cut it, unfortunately.

	M.

-- 
Without deviation from the norm, progress is not possible.



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux