Re: [PATCH 07/50] KVM: PPC: Add generic single register ioctls

Alexander Graf <agraf@xxxxxxx> · Thu, 5 Jan 2012 03:32:27 +0100

On 04.01.2012, at 21:08, Scott Wood wrote:

> On 01/03/2012 07:10 PM, Alexander Graf wrote:
>> Right now we transfer a static struct every time we want to get or set
>> registers. Unfortunately, over time we realize that there are more of
>> these than we thought of before and the extensibility and flexibility of
>> transferring a full struct every time is limited.
>> 
>> So this is a new approach to the problem. With these new ioctls, we can
>> get and set a single register that is identified by an ID. This allows for
>> very precise and limited transmittal of data. When we later realize that
>> it's a better idea to shove over multiple registers at once, we can reuse
>> most of the infrastructure and simply implement a GET_MANY_REGS / SET_MANY_REGS
>> interface.
> 
> If we end up replacing get/set sregs with this (either totally or just
> for new registers), we should consider a runtime facility for userspace
> to extract from the kernel a list of supported registers (with sizes)
> that are supported, so that userspace doesn't have to know about every
> register in order to migrate (this could also be useful if there is
> non-architecturally-readable state that needs to be preserved, such as a
> pending book3e doorbell exception, or the MCAR to be set when machine
> checks are unmasked).  It could also avoid the need for explicit
> capabilities in some situations when new registers are added (like
> KVM_CAP_PPC_HIOR).

Interesting idea. That doesn't quite match up with the way we treat migration today with all fields getting copied into the CPUState and then taken on the wire from there, but I can see how a more generic approach feels more elegant.

Yes, the MANY_REGS interface should be really clever if we want to use it extensively, allowing us to only write a byte stream of exactly the register sizes we care about - no need to waste 128 bytes for every register - and with proper enumeration of available registers.

But to be honest I'm unsure on what the best interface is really. We have 3 different possible way of implementing register sharing between user space and kernel space:

  1) Current way with new CAP and struct enhancements or new structs every time we find something we didn't think about. I count ONE_REG in on this one.
  2) Something fancy like GET_MANY_REGS that would allow us to have a very flexible interface between kernel and user space for getting and setting registers, but we would still need an ioctl for them, so the registers shouldn't be read/set in the fast path anyway, at which point maybe using get/set individually with ONE_REG is still ok?
  3) Putting registers in the shared kvm_run struct so kernel and user space can access them; add a "dirty" bitmap in there so kernel space knows when a register is written from user space. This should be the preferred method for "fast path" registers - registers we need very often from user space.

Since we do have 1) today and 3) should be there for the fast path, I'm not sure how much we really do need 2). But the nice thing here is that we can cross that bridge when we get to it. This patch merely lays the groundwork.

>> The only downpoint I see to this one is that it needs to pad to 1024 bits
>> (hardware is already on 512 bit registers, so I wanted to leave some room)
>> which is slightly too much for transmitting only 64 bits. But if that's all
>> the tradeoff we have to do for getting an extensible interface, I'd say go
>> for it nevertheless.
> 
> Does it really make sense to consider these large things as single
> registers (even if that's how hw documents it)?  Do they need to be set
> atomically?  How do you get/set them in hardware if GPRs aren't that large?

For x86 SSE registers, you usually populate them using memory access IIRC, for altivec it's the same IIUC.

What would you suggest? Have 16 pseudo-registers for a single 1024bit register?

> 
>> +4.65 KVM_SET_ONE_REG
>> +
>> +Capability: KVM_CAP_ONE_REG
>> +Architectures: all
>> +Type: vcpu ioctl
>> +Parameters: struct kvm_one_reg (in)
>> +Returns: 0 on success, negative value on failure
>> +
>> +struct kvm_one_reg {
>> +       __u64 id;
>> +       union {
>> +               __u8 reg8;
>> +               __u16 reg16;
>> +               __u32 reg32;
>> +               __u64 reg64;
>> +               __u8 reg128[16];
>> +               __u8 reg256[32];
>> +               __u8 reg512[64];
>> +               __u8 reg1024[128];
>> +       } u;
>> +};
> 
> Do we need reg8/16/32, or can we simplify (and reduce potential size
> mismatch errors) by using __u64 for everything that doesn't need to be
> an array?

Hrm. Interesting idea. So you would basically reduce everything to __u64 by padding smaller registers and splitting bigger ones into separate IDs? That really is appealing, but might uglify the code quite a bit when remerging bigger registers. For the padding things should be good.

Btw, any reason you're only bringing up these really great ideas on my 2nd pull request after these patches were uncommented on the ML for quite a while? :)

> 
>> +/* Available with KVM_CAP_ONE_REG */
>> +
>> +#define KVM_ONE_REG_GENERIC		0x0000000000000000ULL
> 
> Generic registers?  Is the idea to use this in place of dedicated ioctls
> for certain KVM knobs?

Well, it felt logical to reserve some range for "other use".

> 
>> +/*
>> + * Architecture specific registers are to be defined in arch headers and
>> + * ORed with the arch identifier.
>> + */
>> +#define KVM_ONE_REG_PPC			0x1000000000000000ULL
>> +#define KVM_ONE_REG_X86			0x2000000000000000ULL
>> +#define KVM_ONE_REG_IA64		0x3000000000000000ULL
>> +#define KVM_ONE_REG_ARM			0x4000000000000000ULL
>> +#define KVM_ONE_REG_S390		0x5000000000000000ULL
> 
> Might want to allow space for more than 16 architectures -- there's room
> to spare.

I don't think we'll get to 16 KVM-enabled architectures anytime soon. And if we do, we can just declare the full high byte as arch mask and then we have plenty space :).

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html