[RFC] First (incomplete) cut of Xen paravirt binding

jeremy at goop.org (Jeremy Fitzhardinge) · Wed, 26 Jul 2006 23:37:52 -0700

Rusty Russell wrote:
> 	I want to make three changes to this over time:
>
> 1) Copy the ops structure in the asm, based on value of %ebx (0 == xen,
> etc).  Only copy the non-NULL entries, to make implementing ops simple
> (eg. Xen doesn't want to override all ops).  Xen wants %esi, so I might
> have to move that to %eax: I'll see how it works out.
>   

I'm coming to the conclusion that having separate entrypoints for each 
hypervisor is really the right way to go.  Assuming that all hypervisors 
have some way to set the entrypoint, then its reasonable to also assume 
they're each going to have a different method for doing do (ideally 
orthogonal to each other).  That means that if Xen does it with its 
__xen_guest string section and VMI does it some other (possibly similar) 
way, they can all get along.

It isn't clear to me who sets %ebx in your proposal.  If you're 
suggesting that the hypervisors do it, it seems like a bit presumptuous 
to have a specific mechanism just for us.  If you're saying that a 
common PV startup function needs to try to sniff what hypervisor its 
under, that seems very tricky, particularly since we can't take any 
faults at that point.

Its also possible that a hypervisor is fully virtualizing, so that boot 
proceeds via the normal startup_32 path, but at some later point the 
guest can register some pv_ops for better performance.  (Similar to your 
idea for an in-kernel modular hypervisor.)

At some future point, it would be nice to be able to load replacement 
paravirt ops implementations via multiboot/grub modules (or some similar 
mechanism), so that a the interface to the hypervisor can be updated for 
old guests.  I envision that it would steal control away from the 
in-kernel paravirt code by redirecting the entrypoint, install a new 
pv_ops structure and then boot as normal (I haven't investigated this at 
all; this is just the most plausible-sounding mechanism I thought of).  
This would be the moral equivalent of compiling a new scsi driver for an 
old kernel in order to support new hardware.  It's similar to the idea 
that VMI can replace the ROM from boot to boot, but at a source-level 
API level rather than a fixed long-term ABI.

I also considered the idea of having NULL pointers in the pv_ops 
structure and only copying non-NULL pointers, but I decided against it.  
It seems cleaner to me to explicitly set the pointer to the nopara_ 
function, so that you can easily look at the structure and see which 
functions have been implemented and which have been forgotten.

> 2) Call *paravirt_ops.init rather than hardcoded xen_start_kernel.
>   

That seems particularly pointless.  By the time you need to call it, you 
already know which hypervisor you're under so you could just call it 
directly.  Since there's not much common code between the various 
hypervisor startups (not much code at all, full-stop), there doesn't 
seem to be much scope for usefully sharing code.

> 3) Rename from xen-head.S to paravirt-head.S.
>   

My plan was that there would be a paravirt-foo directory for each 
hypervisor, and a corresponding foo-specific entrypoint in head.S, which 
would be included from foo-head.S.

>> I also haven't really gone over the list of paravirt ops in detail to 
>> see if they're really what we want; I figure that will come up as I keep 
>> adapting Xen to the interface.  But an obvious seems to be we should 
>> have explicit flush_tlb/multicast_flush_tlb calls rather than simply 
>> relying on reloading cr3.
>>     
>
> Yep, and I thought about set_tss_desc, rather than lower-level ops,
> because Xen doesn't want it at all.  But see how you go..
>   

Yeah.  I was just looking at load_idt, which is pretty strange.  The Xen 
version of it ignores the argument and always loads its own exception table.

I'm thinking that if we need it at all, it should be called something 
like load_exceptions(void).  Perhaps it should take the argument, but 
only do something if it == &idt_descr...

    J