Re: [PATCH 2/4] arm64/x86: KVM: Introduce steal time cap

Marc Zyngier <maz@xxxxxxxxxx> · Mon, 22 Jun 2020 11:39:49 +0100

On 2020-06-22 11:31, Andrew Jones wrote:
On Mon, Jun 22, 2020 at 10:51:47AM +0100, Marc Zyngier wrote:
On 2020-06-22 09:41, Andrew Jones wrote:
> On Mon, Jun 22, 2020 at 09:20:02AM +0100, Marc Zyngier wrote:
> > Hi Andrew,
> >
> > On 2020-06-19 19:46, Andrew Jones wrote:
> > > arm64 requires a vcpu fd (KVM_HAS_DEVICE_ATTR vcpu ioctl) to probe
> > > support for steal time. However this is unnecessary and complicates
> > > userspace (userspace may prefer delaying vcpu creation until after
> > > feature probing). Since probing steal time only requires a KVM fd,
> > > we introduce a cap that can be checked.
> >
> > So this is purely an API convenience, right? You want a way to
> > identify the presence of steal time accounting without having to
> > create a vcpu? It would have been nice to have this requirement
> > before we merged this code :-(.
>
> Yes. I wish I had considered it more closely when I was reviewing the
> patches. And, I believe we have yet another user interface issue that
> I'm looking at now. Without the VCPU feature bit I'm not sure how easy
> it will be for a migration to fail when attempting to migrate from a
> host
> with steal-time enabled to one that does not support steal-time. So it's
> starting to look like steal-time should have followed the pmu pattern
> completely, not just the vcpu device ioctl part.

Should we consider disabling steal time altogether until this is 
worked out?

I think we can leave it alone and just try to resolve it before merging
QEMU patches (which I'm working on now). It doesn't look like kvmtool 
or
rust-vmm (the only other two KVM userspaces I'm paying some attention 
to)
do anything with steal-time yet, so they won't notice. And, I'm not 
sure
disabling steal-time for any other userspaces is better than just 
trying
to keep them working the best we can while improving the uapi.

Is it only migration that is affected? Or do you see issues that would
affect non-migrating userspace?

[...]

Accepting the pvtime attributes (setting up the per-vcpu area) has two
effects: we promise both the guest and userspace that we will provide
the guest with steal time. By not checking sched_info_on(), we lie to
both, with potential consequences. It really feels like a bug.

Yes, I agree now. Again, following the pmu pattern looks best here. The
pmu will report that it doesn't have the attr support when its 
underlying
kernel support (perf counters) doesn't exist. That's a direct analogy 
with
steal-time relying on sched_info_on().

Indeed. I'd be happy to take a fix early if you can spin one.

I'll work up another version of this series doing that, but before 
posting
I'll look at the migration issue a bit more and likely post something 
for
that as well.

OK. I'll park this series for now.

Thanks,

        M.
--
Jazz is not dead. It just smells funny...