[RFC] First (incomplete) cut of Xen paravirt binding

jeremy at goop.org (Jeremy Fitzhardinge) · Fri, 28 Jul 2006 11:55:33 -0700

Hi Eric,

Thanks for having a look at this.

Eric W. Biederman wrote:
> The linux 32bit entry point is well defined.
> %ebx holds the cpu number from a bootloader it must be 0.
> %esi holds a pointer to the linux parameter blob, that is usually
> filled in with BIOS calls.
>
> If you need hypervisor information at boot time and none of the
> existing parameters in %esi will suffice, bump the boot protocol
> version and allocate another variable.  And find that variable
> through %esi.  It's an extra instruction but not too hard.
> Bumping the boot protocol is probably desirable anyway because
> it will all reporting that the kernel can be paravirtualized.
>
> /sbin/kexec ultimately has to operate in all of these environments,
> and it would be insane if the bootloader had to be modified for a
> different calling convention for each environment.  If another 
> bootloader is being used it can be taught how to impedance match
> between linux and the hypervisor environment, that is the job of
> a bootloader.
>
> For those environments where paravirtualization is just an optimization
> we mostly likely want the detection to happen in arch/i386/boot/setup.S
> if it needs to happen early.
>
> As small food for thought.  There is currently work in progress to
> place an ELF header at the start of the bzImage to export the 32bit
> entry point, and to export the capability of the kernel being
> relocated.  
>
> Hopefully we can get to the point we can boot a standard bzImage
> kernel on the hypervisors as well.  Even if we can't use the 16bit
> entry point.
>   

There are a few significant differences between a Xen boot up and a 
native one:

    Xen has already set up a clean flat 32-bit environment with paging
    enabled, so there's very little setup needed before getting into
    start_kernel (basically it just needs to set %esp and make sure the
    D flag is clear).  We definitely don't want to be going into the
    16-bit entrypoint.

    The kernel is running in ring != 0, so ring0 instructions will
    fault, and popf misbehave.   Xen can (and does) emulate some of the
    ring0 instructions, but not necessarily enough to deal with
    startup_32 (I haven't looked at this in detail yet).  Either way,
    startup_32 would need to be modified to avoid the difficult cases.

    Xen also supports privileged kernels which run in ring 0, but
    they're stlil fully paravirtualized kernels; they should not use
    their ring0 status to set up the processor state without doing it
    through Xen.

    At present, Xen also passes a pointer to an info-block in %esi.  We
    could hang that off a normal boot params block if that looked like a
    useful thing to do.

    Also, the set of supported CPUs is smaller, so most of the cpuid
    stuff is reundant.  It would also need to be redone using the Xen
    version of cpuid to get a correct set of information.

This all makes me think it would be more awkward than helpful to have 
the Xen boot path go through the normal startup_32 path.

Zach proposed a change to the beginning of startup_32 to see if its 
running in ring != 0 or if paging is already enabled, and then jumping 
to a startup_paravirt entrypoint.  That's workable, but it essentially 
means we're creating a distinct hypervisor boot protocol.  That's not 
necessarily a bad thing - and it could be made to look more like the 
normal boot protocol - but because the setup code is so simple there 
doesn't seem to be a lot to be gained from it.  In the Xen case, it 
makes more sense to simply have a separate Xen-specific entrypoint to do 
a little bit of setup before jumping into start_kernel.

    J