On 20.01.20 19:57, milanpa@xxxxxxxxxx wrote:
On 1/20/20 6:53 PM, Alexander Graf wrote:
On 18.01.20 00:38, Paolo Bonzini wrote:
On 15/01/20 15:59, Alexander Graf wrote:
On 15.01.20 15:43, milanpa@xxxxxxxxxx wrote:
Let's expose new interface to userspace for garhering these
statistics with one ioctl.
Userspace application can read counter description once using
KVM_GET_SUPPORTED_DEBUGFS_STAT and periodically invoke the
KVM_GET_DEBUGFS_VALUES to get value update.
This is an interface that requires a lot of logic and buffers from
user space to retrieve individual, explicit counters. What if I just
wanted to monitor the number of exits on every user space exit?
In case we want to cover such latency sensitive use cases solution b),
with mmap'ed structs you suggested, would be a way to go, IMO.
Also, we're suddenly making the debugfs names a full ABI, because
through this interface we only identify the individual stats through
their names. That means we can not remove stats or change their
names, because people may rely on them, no? Thining about this again,
maybe they already are an ABI because people rely on them in debugfs
though?
In theory not, in practice I have treated them as a kind of "soft" ABI:
if the meaning changes you should rename them, and removing them is
fine, but you shouldn't for example change the unit of measure (which is
not hard since they are all counters :) but perhaps you could have
nanoseconds vs TSC cycles in the future for some counters).
I see two alternatives to this approach here:
a) ONE_REG
We can just add a new DEBUG arch in ONE_REG and expose ONE_REG per VM
as well (if we really have to?). That gives us explicit identifiers
for each stat with an explicit path to introduce new ones with very
unique identifiers.
ONE_REG would force us to define constants for each counter, and would
make it hard to retire them. I don't like this.
Why does it make it hard to retire them? We would just return -EINVAL
on retrieval, like we do for any other non-supported ONE_REG.
It's the same as a file not existing in debugfs/statfs. Or an entry in
the array of this patch to disappear.
b) part of the mmap'ed vcpu struct
Same here. Even if we say the semantics of the struct would be exposed
to userspace via KVM_GET_SUPPORTED_DEBUGFS_STAT, someone might end up
getting this wrong and expecting a particular layout. Milan's proposed
API has the big advantage of being hard to get wrong for userspace. And
pushing the aggregation to userspace is not a huge chore, but it's still
a chore.
So unless someone has a usecase for latency-sensitive monitoring I'd
prefer to keep it simple (usually these kind of stats can even make
sense if you gather them over relatively large period of time, because
then you'll probably use something else like tracepoints to actually
pinpoint what's going on).
I tend to agree. Fetching them via an explicit call into the kernel is
definitely the safer route.
2) vcpu counters
Most of the counters count on vcpu granularity, but debugfs only
gives us a full VM view. Whatever we do to improve the situation, we
should definitely try to build something that allows us to get the
counters per vcpu (as well).
The main purpose of these counters is monitoring. It can be quite
important to know that only a single vCPU is going wild, compared to
all of them for example.
I agree, exposing per vcpu counters can be useful. I guess it didn't
make much sense exposing them through debugfs so aggregation was done
in kernel. However if we chose to go with approach 1-b) mmap counters
struct in userspace, we could do this.
The reason I dislike the debugfs/statfs approach is that it generates a
completely separate permission and access paths to the stats. That's
great for full system monitoring, but really bad when you have multiple
individual tenants on a single host.
I agree, anything in sysfs is complementary to vmfd/vcpufd access.
Cool :).
So we only need to agree on ONE_REG vs. this patch mostly?
What about extending KVM_GET_SUPPORTED_DEBUGFS_STAT with some additional
information like the data type and unit? ONE_REG exposes additional
semantics about data.
It's not a problem of exposing the type information - we have that today
by implicitly saying "every counter is 64bit".
The thing I'm worried about is that we keep inventing these special
purpose interfaces that really do nothing but transfer numbers in one
way or another. ONE_REG's purpose was to unify them. Debug counters
really are the same story.
Alex
Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879