Re: [PATCH 2/4] arm64/x86: KVM: Introduce steal time cap

Andrew Jones <drjones@xxxxxxxxxx> · Mon, 22 Jun 2020 13:04:07 +0200

On Mon, Jun 22, 2020 at 11:39:49AM +0100, Marc Zyngier wrote:
> On 2020-06-22 11:31, Andrew Jones wrote:
> > On Mon, Jun 22, 2020 at 10:51:47AM +0100, Marc Zyngier wrote:
> > > On 2020-06-22 09:41, Andrew Jones wrote:
> > > > On Mon, Jun 22, 2020 at 09:20:02AM +0100, Marc Zyngier wrote:
> > > > > Hi Andrew,
> > > > >
> > > > > On 2020-06-19 19:46, Andrew Jones wrote:
> > > > > > arm64 requires a vcpu fd (KVM_HAS_DEVICE_ATTR vcpu ioctl) to probe
> > > > > > support for steal time. However this is unnecessary and complicates
> > > > > > userspace (userspace may prefer delaying vcpu creation until after
> > > > > > feature probing). Since probing steal time only requires a KVM fd,
> > > > > > we introduce a cap that can be checked.
> > > > >
> > > > > So this is purely an API convenience, right? You want a way to
> > > > > identify the presence of steal time accounting without having to
> > > > > create a vcpu? It would have been nice to have this requirement
> > > > > before we merged this code :-(.
> > > >
> > > > Yes. I wish I had considered it more closely when I was reviewing the
> > > > patches. And, I believe we have yet another user interface issue that
> > > > I'm looking at now. Without the VCPU feature bit I'm not sure how easy
> > > > it will be for a migration to fail when attempting to migrate from a
> > > > host
> > > > with steal-time enabled to one that does not support steal-time. So it's
> > > > starting to look like steal-time should have followed the pmu pattern
> > > > completely, not just the vcpu device ioctl part.
> > > 
> > > Should we consider disabling steal time altogether until this is
> > > worked out?
> > 
> > I think we can leave it alone and just try to resolve it before merging
> > QEMU patches (which I'm working on now). It doesn't look like kvmtool or
> > rust-vmm (the only other two KVM userspaces I'm paying some attention
> > to)
> > do anything with steal-time yet, so they won't notice. And, I'm not sure
> > disabling steal-time for any other userspaces is better than just trying
> > to keep them working the best we can while improving the uapi.
> 
> Is it only migration that is affected? Or do you see issues that would
> affect non-migrating userspace?

If we don't consider the need to check sched_info_on(), then other than
the probing requiring a vcpu fd (which isn't a show stopper, just not
super convenient), then I don't see any other issues for non-migrating
userspace.

> 
> [...]
> 
> > > Accepting the pvtime attributes (setting up the per-vcpu area) has two
> > > effects: we promise both the guest and userspace that we will provide
> > > the guest with steal time. By not checking sched_info_on(), we lie to
> > > both, with potential consequences. It really feels like a bug.
> > 
> > Yes, I agree now. Again, following the pmu pattern looks best here. The
> > pmu will report that it doesn't have the attr support when its
> > underlying
> > kernel support (perf counters) doesn't exist. That's a direct analogy
> > with
> > steal-time relying on sched_info_on().
> 
> Indeed. I'd be happy to take a fix early if you can spin one.

I'll post just that patch now then.

> 
> > I'll work up another version of this series doing that, but before
> > posting
> > I'll look at the migration issue a bit more and likely post something
> > for
> > that as well.
> 
> OK. I'll park this series for now.

Thanks,
drew