On Mon, Jun 22, 2020 at 11:39:49AM +0100, Marc Zyngier wrote: > On 2020-06-22 11:31, Andrew Jones wrote: > > On Mon, Jun 22, 2020 at 10:51:47AM +0100, Marc Zyngier wrote: > > > On 2020-06-22 09:41, Andrew Jones wrote: > > > > On Mon, Jun 22, 2020 at 09:20:02AM +0100, Marc Zyngier wrote: > > > > > Hi Andrew, > > > > > > > > > > On 2020-06-19 19:46, Andrew Jones wrote: > > > > > > arm64 requires a vcpu fd (KVM_HAS_DEVICE_ATTR vcpu ioctl) to probe > > > > > > support for steal time. However this is unnecessary and complicates > > > > > > userspace (userspace may prefer delaying vcpu creation until after > > > > > > feature probing). Since probing steal time only requires a KVM fd, > > > > > > we introduce a cap that can be checked. > > > > > > > > > > So this is purely an API convenience, right? You want a way to > > > > > identify the presence of steal time accounting without having to > > > > > create a vcpu? It would have been nice to have this requirement > > > > > before we merged this code :-(. > > > > > > > > Yes. I wish I had considered it more closely when I was reviewing the > > > > patches. And, I believe we have yet another user interface issue that > > > > I'm looking at now. Without the VCPU feature bit I'm not sure how easy > > > > it will be for a migration to fail when attempting to migrate from a > > > > host > > > > with steal-time enabled to one that does not support steal-time. So it's > > > > starting to look like steal-time should have followed the pmu pattern > > > > completely, not just the vcpu device ioctl part. > > > > > > Should we consider disabling steal time altogether until this is > > > worked out? > > > > I think we can leave it alone and just try to resolve it before merging > > QEMU patches (which I'm working on now). It doesn't look like kvmtool or > > rust-vmm (the only other two KVM userspaces I'm paying some attention > > to) > > do anything with steal-time yet, so they won't notice. And, I'm not sure > > disabling steal-time for any other userspaces is better than just trying > > to keep them working the best we can while improving the uapi. > > Is it only migration that is affected? Or do you see issues that would > affect non-migrating userspace? If we don't consider the need to check sched_info_on(), then other than the probing requiring a vcpu fd (which isn't a show stopper, just not super convenient), then I don't see any other issues for non-migrating userspace. > > [...] > > > > Accepting the pvtime attributes (setting up the per-vcpu area) has two > > > effects: we promise both the guest and userspace that we will provide > > > the guest with steal time. By not checking sched_info_on(), we lie to > > > both, with potential consequences. It really feels like a bug. > > > > Yes, I agree now. Again, following the pmu pattern looks best here. The > > pmu will report that it doesn't have the attr support when its > > underlying > > kernel support (perf counters) doesn't exist. That's a direct analogy > > with > > steal-time relying on sched_info_on(). > > Indeed. I'd be happy to take a fix early if you can spin one. I'll post just that patch now then. > > > I'll work up another version of this series doing that, but before > > posting > > I'll look at the migration issue a bit more and likely post something > > for > > that as well. > > OK. I'll park this series for now. Thanks, drew