Re: [PATCH v9 3/5] KVM: stats: Add documentation for statistics data binary interface

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 16/06/21 20:18, Greg KH wrote:
On Wed, Jun 16, 2021 at 06:59:15PM +0200, Paolo Bonzini wrote:
- varlink structs are encoded as JSON dictionaries.  Therefore, every time
userspace reads the fields, the kernel has to include the field names as
JSON dictionary keys.  This means that a lot of time is spent writing
buffers, and on the receiving side parsing them.

Has this been measured?  Years ago when I messed with this, it was in
the noise as JSON parsing is really fast these days.

Yes, parsing is really fast. However, the work doesn't end at building an in-memory representation. An efficient representation (and a schema that is negotiated in advance) makes it possible to do this work as late and as little as possible, instead of doing it on every fetch of the statistics.

For cloud vendors running virtual machines, they want to consolidate housekeeping tasks on as few CPUs as possible (because housekeeping CPUs cannot be billed to the customers), and every millisecond really counts. ASCII is inviting, but things like Protobufs, FlatBuffers, Cap'n'Proto are all binary because all those hidden costs do exist.

- because numeric data has to be converted to ASCII the output doesn't have
fixed offsets, so it is not possible to make an efficient implementation of
pread.

efficient where?  In the kernel?

Yes, Jing's patches can just do a quick copy_to_user if pread does not access the schema. And it's very simple code too.

- even though Varlink specifies that int is "usually int64", a little-known
gem is that JSON behavior for numbers not representable as a double (i.e.
exceeding 2^53) is implementation-defined

That's interesting, do the varlink developers know this?  And we can say
"for the kernel int will be int64" and be done with it, so this
shouldn't be that big of an issue.

Well yeah, but there's still the problem of what the other side thinks. In the end varlink's interesting because it's just JSON, meaning there's plenty of parsers available---but they all too often don't separate int vs. double. We had this issue with projects talking to QEMU (which has been using JSON the same way as varlink for ten years or so) and JSON parsers returning an overflow for 2^64-1 (because it rounds to 2^64) or an incorrect value. I'm not saying it's a showstopper, it's just an unavoidable ugliness if you pick JSON.

For the schema, there are some specific problems with varlink, but also a
more generic issue.  The specific problems are:

- the schema doesn't include the length of arrays.  This makes it hard to
compute in advance lengths and offsets of fields (even ignoring the fact
that data is not binary, which I'll get to later)

Do you care in advance?

Yes, once we add for example histograms we would like to include in the schema the size and number of the buckets.

Again, I didn't think this was an issue with the kernel implementation
in that the userspace side could determine the schema by the data coming
from the kernel, it wouldn't have to "know" about it ahead of time.
But I could be wrong.

No, you're right. The C implementations are really just very thin wrappers over JSON. There's very little "Varlink"ness in them.

However the interesting part of the schema are the metadata--the unit, whether something is an instant vs. a cumulative value, the bucket size when we add histograms. These things are obviously not included in the data and must be communicated separately. Userspace tools could also use a schema to validate user requests ("record the halt_poll_fail_ns every second").

All that said, what we _could_ do is serialize the schema as JSON
instead of using a binary format

It should be in some standard format. If not, and it sounds like you
have looked into it, or at least the  userspace side, then that's fine.
But you should write up a justification somewhere why you didn't use an
existing format (what about the netlink format?)

I guess you're talking about NETLINK_GENERIC, that also has the issue that the schema (the attributes) is not dynamic but rather part of the uAPI. We explicitly don't want them to be stable, they're like tracepoints in that respect and that's why we took ideas from trace-cmd. Anyway, as a start Jing will summarize all these discussions in v10.

Thanks,

Paolo




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Kernel Development]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite Info]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Linux Media]     [Device Mapper]

  Powered by Linux