Rusty Russell <rusty at rustcorp.com.au> writes: > On Fri, 2006-07-28 at 13:45 -0700, Zachary Amsden wrote: >> Jeremy Fitzhardinge wrote >> > >> >> Why do you insist that Xen have a separate kernel entry point? >> > >> > Because if you're going to distinguish hypervisors by putting a magic >> > value in a register, you get the best bang for the buck if the >> > register is EIP. >> >> I think the larger issue that Eric pointed out is that there is only a >> single entry field in the ELF header. > > I think this has been useful discussion, but I think we're starting to > go in circles. > > (1) We can make startup_32 work for every known and future reasonable > hypervisor as well as native, by testing if ring isn't 0 and paging is > enabled and jumping to the paravirt entry path. > > (2) Xen 3.0's boot process uses %esi and sets all other regs to 0. We > should not break this. If we use %ebx to indicate what paravirt_ops to > use, and assign Xen 0, this works fine. We only clobber %ebx, %eax and > %esp before calling paravirts[%ebx]->init(). > > (3) If a hypervisor wants another entry point, this scheme doesn't > prohibit it, of course. If every hypervisor does this, we've wasted > some effort, but not for lack of trying. > > My question for Eric is: will this fit with kexec (which I know v. > little about)? So /sbin/kexec is user space code that I can make as simple or as complex as needed. The kexec system call is a general method for loading data outside of the kernel and in principle load just about anything (With the practical consideration that after a kernel has run firmware calls don't usually work). What I don't want to do is have a lot of variation in /sbin/kexec for loading linux under different hypervisors for no particular reason. One of the goals is a binary kernel that can work on real hardware and work with different hypervisors. That means a bzImage kernel. Anything else will be unnecessary confusion and maintenance for end users and distro vendors. Because a bzImage kernel decompresses the bulk of the kernel, multiple entry points are almost totally impractical. For a lot of reasons the bootloader<->kernel ABI is a very slowly changing thing. One result is that if the kernel can probe it's environment and not depend on the bootloader to do the work that is the more reliable path. Another result is that we need to be very careful with changes. Because anything we do must be supported for a long time, and the number of people who understand all of the details is few. In the unix community there is good general consensus that on the ELF file format, and because I need to express things that the current bzImage format cannot express I am in the process of upgrading our bzImage kernels to also be ELF executables. While still supporting everything they do today. (I should be posting my patches sometime Monday). What I don't see is any consensus or reasonable proposals for argument passing conventions. Since there is no sane set of calling conventions that everyone can agree on, or is fundamentally better than what we currently have I don't see a point in doing something different than what linux currently does if it can reasonably do the job. It will be specific to linux but at least it will be standard there. For background: The most important part of a calling convention is that it is a widely used standard that people can agree upon. A better argument passing convention would need to be based on tagged values (so the set of parameters passed can be changed as time passes), and probably on the C calling conventions so there is portability between machine architectures. There isn't even a serious proposal on the table for anything like that. That is what I believe it would take for a parameter passing convention to be used by multiple bootloaders in multiple environments and to boot multiple operating systems, and to pass the test of time. Even there a table at a well known address in memory that standalone executables can later probe might be a better approach. In the existing model of booting operating systems it is the purpose of bootloaders to translate between some the environment and the what a kernel needs. For the hypervisors that do not do full virtualization the work can get pushed to the bootloader. So I do not see a sane alternative to using %esi to point to the existing linux parameter block. We can easily stuff a hypervisor id in the parameter block. So we can do: "paravirts[%esi->hypervisor]->init();" I know what I am suggesting is different than the existing hypervisor practice on x86. Unfortunately the existing hypervisor practice is not compatible with existing bootloader conventions. So something must give. If something I am requesting seems arbitrary please ask and I should be able to expand on the justification. I have been doing this so long it is easy to miss what context other people don't have. Eric