[RFC] First (incomplete) cut of Xen paravirt binding

ebiederm at xmission.com (Eric W. Biederman) · Sun, 30 Jul 2006 22:00:08 -0600

Rusty Russell <rusty at rustcorp.com.au> writes:

> On Fri, 2006-07-28 at 13:45 -0700, Zachary Amsden wrote:
>> Jeremy Fitzhardinge wrote
>> >
>> >> Why do you insist that Xen have a separate kernel entry point?
>> >
>> > Because if you're going to distinguish hypervisors by putting a magic 
>> > value in a register, you get the best bang for the buck if the 
>> > register is EIP.
>> 
>> I think the larger issue that Eric pointed out is that there is only a 
>> single entry field in the ELF header.
>
> I think this has been useful discussion, but I think we're starting to
> go in circles.
>
> (1) We can make startup_32 work for every known and future reasonable
> hypervisor as well as native, by testing if ring isn't 0 and paging is
> enabled and jumping to the paravirt entry path.
>
> (2) Xen 3.0's boot process uses %esi and sets all other regs to 0.  We
> should not break this.  If we use %ebx to indicate what paravirt_ops to
> use, and assign Xen 0, this works fine.  We only clobber %ebx, %eax and
> %esp before calling paravirts[%ebx]->init().
>
> (3) If a hypervisor wants another entry point, this scheme doesn't
> prohibit it, of course.  If every hypervisor does this, we've wasted
> some effort, but not for lack of trying.
>
> My question for Eric is: will this fit with kexec (which I know v.
> little about)?

So /sbin/kexec is user space code that I can make as simple or
as complex as needed.  The kexec system call is a general method
for loading data outside of the kernel and in principle load just
about anything (With the practical consideration that after a kernel
has run firmware calls don't usually work).

What I don't want to do is have a lot of variation in /sbin/kexec
for loading linux under different hypervisors for no particular
reason.

One of the goals is a binary kernel that can work on real hardware
and work with different hypervisors.  That means a bzImage kernel.
Anything else will be unnecessary confusion and maintenance for
end users and distro vendors.

Because a bzImage kernel decompresses the bulk of the kernel,
multiple entry points are almost totally impractical.

For a lot of reasons the bootloader<->kernel ABI is a very slowly
changing thing.  One result is that if the kernel can probe
it's environment and not depend on the bootloader to do the
work that is the more reliable path.  Another result is that
we need to be very careful with changes.  Because anything
we do must be supported for a long time, and the number
of people who understand all of the details is few.

In the unix community there is good general consensus that
on the ELF file format, and because I need to express things
that the current bzImage format cannot express I am in the
process of upgrading our bzImage kernels to also be ELF
executables.  While still supporting everything they do today.
(I should be posting my patches sometime Monday).

What I don't see is any consensus or reasonable proposals for
argument passing conventions.  

Since there is no sane set of calling conventions that
everyone can agree on, or is fundamentally better than
what we currently have I don't see a point in doing
something different than what linux currently does if
it can reasonably do the job.  It will be specific to linux
but at least it will be standard there.

For background: The most important part of a calling convention
is that it is a widely used standard that people can agree upon.
A better argument passing convention would need to be based on tagged
values (so the set of parameters passed can be changed as time
passes), and probably on the C calling conventions so there is
portability between machine architectures.  There isn't even a serious
proposal on the table for anything like that.  That is what I believe
it would take for a parameter passing convention to be used by
multiple bootloaders in multiple environments and to boot multiple
operating systems, and to pass the test of time.  Even there
a table at a well known  address in memory that standalone
executables can later probe might be a better approach.

In the existing model of booting operating systems it is the purpose
of bootloaders to translate between some the environment and the what
a kernel needs.  For the hypervisors that do not do full
virtualization the work can get pushed to the bootloader.  So I do not
see a sane alternative to using %esi to point to the existing linux
parameter block.

We can easily stuff a hypervisor id in the parameter block.  So
we can do: "paravirts[%esi->hypervisor]->init();"

I know what I am suggesting is different than the existing
hypervisor practice on x86.  Unfortunately the existing hypervisor
practice is not compatible with existing bootloader conventions.
So something must give.

If something I am requesting seems arbitrary please ask and
I should be able to expand on the justification.  I have been
doing this so long it is easy to miss what context other people don't
have.

Eric