Re: [PATCH 1/9] KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls

Paul Mackerras <paulus@xxxxxxxxx> · Fri, 5 Apr 2013 09:38:58 +1100

On Thu, Apr 04, 2013 at 11:49:55AM +0200, Alexander Graf wrote:
> 
> On 04.04.2013, at 07:37, Paul Mackerras wrote:
> 
> > On Thu, Mar 21, 2013 at 09:52:16AM +0100, Alexander Graf wrote:
> >>> +/* Platform specific hcalls, used by KVM */
> >>> +#define H_RTAS			0xf000
> >> 
> >> How about you define a different hcall ID for this? Then QEMU would
> >> create its "rtas entry blob" such that KVM-routed RTAS handling goes
> >> to KVM directly.
> > 
> > QEMU can still do that, and I don't see that it would change the
> > kernel side if it did.  We would still have to have agreement between
> > the kernel and userspace as to what the hcall number for invoking the
> > in-kernel RTAS calls was, and the kernel would still have to keep a
> > list of token numbers and how they correspond to the functions it
> > provides.  The only thing different would be that the in-kernel RTAS
> > hcall could return to the guest if it didn't recognize the token
> > number, rather than pushing the problem up to userspace.  However,
> > that wouldn't make the code any simpler, and it isn't a situation
> > where performance is an issue.
> > 
> > Do you see some kernel-side improvements or simplifications from your
> > suggestion that I'm missing?  Remember, the guest gets the token
> > numbers from the device tree (properties under the /rtas node), so
> > they are under the control of userspace/QEMU.
> 
> The code flow with this patch:
> 
>   <setup time>
> 
>   foreach (override in overrides)
>     ioctl(OVERRIDE_RTAS, ...);
> 
>   <runtime>
> 
>   switch (hcall_id) {
>   case QEMU_RTAS_ID:
>     foreach (override in kvm_overrides) {
>       int rtas_id = ...;
>       if (override.rtas_id == rtas_id) {
>         handle_rtas();

Actually this is more like: override.handler();

>         handled = true;
>       }
>     }
>     if (!handled)
>       pass_to_qemu();
>     break;
>   default:
>     pass_to_qemu();
>     break
>   }
> 
> What I'm suggesting:
> 
>   <setup time>
> 
>   nothing from KVM's point of view

Actually, this can't be "nothing".

The way the RTAS calls work is that there is a name and a "token"
(32-bit integer value) for each RTAS call.  The tokens have to be
unique for each different name.  Userspace puts the names and tokens
in the device tree under the /rtas node (a set of properties where the
property name is the RTAS function name and the property value is the
token).  The guest looks up the token for each RTAS function it wants
to use, and passes the token in the argument buffer for the RTAS call.

This means that userspace has to know the names and tokens for all
supported RTAS functions, both the ones implemented in the kernel and
the ones implemented in userspace.

Also, the token numbers are pretty arbitrary, and the token numbers
for the kernel-implemented RTAS functions could be chosen by userspace
or by the kernel.  If they're chosen by the kernel, then userspace
needs a way to discover them (so it can put them in the device tree),
and also has to avoid choosing any token numbers for its functions
that collide with a kernel-chosen token.  If userspace chooses the
token numbers, it has to tell the kernel what token numbers it has
chosen for the kernel-implemented RTAS functions.  We chose the latter
since it gives userspace more control.

So this <setup time> code has to be either (your suggestion):

    foreach RTAS function possibly implemented in kernel {
        query kernel token for function, by name
	if that gives an error, mark function as needing to be
	        implemented in userspace
    }
    (userspace) allocate tokens for remaining functions,
                avoiding collisions with kernel-chosen tokens

or else it is (my suggestion):

    (userspace) allocate tokens for all RTAS functions
    foreach RTAS function possibly implemented in kernel {
        tell kernel the (name, token) correspondence
    }

>   <runtime>
> 
>   switch (hcall_id) {
>   case KVM_RTAS_ID:
>     handle_rtas();

Here, you've compressed details that you expanded in your pseudo-code
above, making this a less than fair comparison.  This handle_rtas()
function has to fetch the token and branch out to the appropriate
handler routine.  Whether that's a switch statement or a loop over
registered handlers doesn't make all that much difference.

>     break;
>   default:
>     pass_to_qemu();
>     break;
>   }
> 
> 
> Which one looks easier and less error prone to you? :)
> 
> Speaking of which, how does user space know that the kernel actually
> supports a specific RTAS token? 

It's really the names that are more important, the tokens are pretty
arbitrary.  In my scheme, userspace does a KVM_PPC_RTAS_DEFINE_TOKEN
ioctl giving the name and the (userspace-chosen) token, which gets an
error if the kernel doesn't recognize the name.  In your scheme, there
would have to be an equivalent ioctl to query the (kernel-chosen)
token for a given name, which once again would return an error if the
kernel doesn't recognize the name.  Either way the kernel has to have
a list of names that it knows about.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html