Re: [PATCH v4 1/4] KVM: Implement dirty quota-based throttling of vcpus

Peter Xu <peterx@xxxxxxxxxx> · Thu, 14 Jul 2022 18:19:07 -0400

On Thu, Jul 14, 2022 at 08:48:04PM +0000, Sean Christopherson wrote:
> On Thu, Jul 14, 2022, Peter Xu wrote:
> > Hi, Shivam,
> > 
> > On Tue, Jul 05, 2022 at 12:51:01PM +0530, Shivam Kumar wrote:
> > > Hi, here's a summary of what needs to be changed and what should be kept as
> > > it is (purely my opinion based on the discussions we have had so far):
> > > 
> > > i) Moving the dirty quota check to mark_page_dirty_in_slot. Use kvm requests
> > > in dirty quota check. I hope that the ceiling-based approach, with proper
> > > documentation and an ioctl exposed for resetting 'dirty_quota' and
> > > 'pages_dirtied', is good enough. Please post your suggestions if you think
> > > otherwise.
> > 
> > An ioctl just for this could be an overkill to me.
> >
> > Currently you exposes only "quota" to kvm_run, then when vmexit you have
> > exit fields contain both "quota" and "count".  I always think it's a bit
> > redundant.
> > 
> > What I'm thinking is:
> > 
> >   (1) Expose both "quota" and "count" in kvm_run, then:
> > 
> >       "quota" should only be written by userspace and read by kernel.
> >       "count" should only be written by kernel and read by the userspace. [*]
> > 
> >       [*] One special case is when the userspace found that there's risk of
> >       quota & count overflow, then the userspace:
> > 
> >         - Kick the vcpu out (so the kernel won't write to "count" anymore)
> >         - Update both "quota" and "count" to safe values
> >         - Resume the KVM_RUN
> > 
> >   (2) When quota reached, we don't need to copy quota/count in vmexit
> >       fields, since the userspace can read the realtime values in kvm_run.
> > 
> > Would this work?
> 
> Technically, yes, practically speaking, no.  If KVM doesn't provide the quota
> that _KVM_ saw at the time of exit, then there's no sane way to audit KVM exits
> due to KVM_EXIT_DIRTY_QUOTA_EXHAUSTED.  Providing the quota ensure userspace sees
> sane, coherent data if there's a race between KVM checking the quota and userspace
> updating the quota.  If KVM doesn't provide the quota, then userspace can see an
> exit with "count < quota".

This is rare false positive which should be acceptable in this case (the
same as vmexit with count==quota but we just planned to boost the quota),
IMHO it's better than always kicking the vcpu, since the overhead for such
false is only a vmexit but nothing else.

> 
> Even if userspace is ok with such races, it will be extremely difficult to detect
> KVM issues if we mess something up because such behavior would have to be allowed
> by KVM's ABI.

Could you elaborate?  We have quite a few places sharing these between
user/kernel on kvm_run, no?

> 
> > > ii) The change in VMX's handle_invalid_guest_state() remains as it is.
> > > iii) For now, we are lazily updating dirty quota, i.e. we are updating it
> > > only when the vcpu exits to userspace with the exit reason
> > > KVM_EXIT_DIRTY_QUOTA_EXHAUSTED.
> > 
> > At least with above design, IMHO we can update kvm_run.quota in real time
> > without kicking vcpu threads (the quota only increases except the reset
> > phase for overflow).  Or is there any other reason you need to kick vcpu
> > out when updating quota?
> 
> I'm not convinced overflow is actually possible.

Yeah from that part I second you.  I don't really think it'll even happen
in practise, but I'd still like to make sure we won't introduce an ioctl
for the overflow only.

-- 
Peter Xu