[RFC] First (incomplete) cut of Xen paravirt binding

zach at vmware.com (Zachary Amsden) · Thu, 27 Jul 2006 22:28:22 -0700

Rusty Russell wrote
>> I'm coming to the conclusion that having separate entrypoints for each 
>> hypervisor is really the right way to go.  Assuming that all hypervisors 
>> have some way to set the entrypoint, then its reasonable to also assume 
>> they're each going to have a different method for doing do (ideally 
>> orthogonal to each other).  That means that if Xen does it with its 
>> __xen_guest string section and VMI does it some other (possibly similar) 
>> way, they can all get along.
>>     
>
> Hi Jeremy,
>
> 	I completely disagree with this.  While it's great that Xen has a way
> of encapsulating this information, there's no good reason not to have a
> (Linux) standard way of doing so.  Of course, it should work with Xen,
> too, which is why it's written the way it is...
>
> 	Remember, my goal differs substantially from yours and Zachs: it is to
> make writing a new hypervisor interface as simple as possible.
>   

Hey, that's my goal too.  I think we all want to wear that hat during 
this process ;)
>   
>> It isn't clear to me who sets %ebx in your proposal.  If you're 
>> suggesting that the hypervisors do it, it seems like a bit presumptuous 
>> to have a specific mechanism just for us.  If you're saying that a 
>> common PV startup function needs to try to sniff what hypervisor its 
>> under, that seems very tricky, particularly since we can't take any 
>> faults at that point.
>>     
>
> 	Naah, the hypervisor sets %ebx.  That's 0 for you, so it's all fine.
>   

I think we should follow the multiboot protocol as closely as possible, 
to make a single entry point possible.

http://www.gnu.org/software/grub/manual/multiboot/multiboot.html#Machine-state  

In this case, %eax will have a magic value, set by the loader / 
hypervisor, %ebx has a pointer to the multiboot information.  The 
boot_loader_name field looks particularly interesting for determining 
the paravirt_ops index.

Or perhaps, we should consider passing a different magic in %eax, which 
differentiates a hypervisor boot from a native boot, and use a different 
magic in the multiboot header to identify the alternative PV entry 
point.  Multiboot is a pretty common thing, and I'd rather use a 
standard method for identifying the entry than an ad-hoc string table.

Course all of this argument is made moot by the comments I make below.

>   
>> Its also possible that a hypervisor is fully virtualizing, so that boot 
>> proceeds via the normal startup_32 path, but at some later point the 
>> guest can register some pv_ops for better performance.  (Similar to your 
>> idea for an in-kernel modular hypervisor.)
>>     

It is quite possible this will be a _requirement_ for our 
implementation.  We boot through a fully virtualized environment, and 
getting rid of that is next to impossible currently.  We rely on our 
BIOS to scan the PCI bus, set up PCI devices, program hardware to 
initial settings, create the E820 map, load the boot sector, and jump 
into the bootloader.  I'd love to have a nice paravirtualized start of 
day like Xen, but from where we stand, that is a huge amount of work.  
So I think we may have to defer initialization until much later, at 
least for a while.  We could pass a magic value in EBX, but it would 
require an alternative bootloader in front of the kernel.  For now we 
will need an ugly hack to probe for the VMI during the startup path.

Further on, I think we would like to move towards a kernel module (GPL 
module, no less) that inserts the VMI paravirt ops, but that won't 
happen until far later in the boot sequence - a true in-kernel modular 
hypervisor.

Zach