On 28.06.2010, at 09:18, Milton Miller wrote: > On Sun Jun 27 around 19:33:52 EST 2010 Alexander Graf wrote: >> Am 27.06.2010 um 10:14 schrieb Avi Kivity <avi at redhat.com>: >>> On 06/26/2010 02:25 AM, Alexander Graf wrote: > >>>> + >>>> +PPC hypercalls >>>> +============== >>>> + >>>> +The only viable ways to reliably get from guest context to host >>>> context are: >>>> + >>>> + 1) Call an invalid instruction >>>> + 2) Call the "sc" instruction with a parameter to "sc" >>>> + 3) Call the "sc" instruction with parameters in GPRs >>>> + >>>> +Method 1 is always a bad idea. Invalid instructions can be >>>> replaced later on >>>> +by valid instructions, rendering the interface broken. >>>> + >>>> +Method 2 also has downfalls. If the parameter to "sc" is != 0 the >>>> spec is >>>> +rather unclear if the sc is targeted directly for the hypervisor >>>> or the >>>> +supervisor. It would also require that we read the syscall issuing >>>> instruction >>>> +every time a syscall is issued, slowing down guest syscalls. >>>> + > > It goes to the hypervisor, and it would require the hypervisor to > return to the supervisor, but I believe it just returns to the user with > permission denied. That's what I assumed, yeah :(. > >>>> +Method 3 is what KVM uses. We pass magic constants >>>> (KVM_SC_MAGIC_R3 and >>>> +KVM_SC_MAGIC_R4) in r3 and r4 respectively. If a syscall >>>> instruction with these >>>> +magic values arrives from the guest's kernel mode, we take the >>>> syscall as a >>>> +hypercall. >>>> >>> >>> Is there any chance a normal syscall will have those values in r3 >>> and r4? >> >> r3 is the syscall number. So as long as the guest doesn't reuse that >> value, we're safe. Since in general syscall numbers are not randomly >> scattered throughout the number range, we should be ok here. >> > > No, r0 has the system call number. Registers 3 and 4 are the first > 2 args in c abi (or first 64 bit arg in 32 bit c abi), but the linux > syscall abi special. (In addition, it returns success or failure in > cr0). Oh. Ahem :) > >>> >>> If so, maybe it's better to use pc as they key for hypercalls. Let >>> the guest designate one instruction address as the hypercall call >>> point; kvm can easily check it and reflect it back to the guest if >>> it doesn't match. >>> >> >> You mean the guest would tell the hv where the hypercall lies? That >> would require a hypercall, no? Defining it statically is tricky. I >> want to PV'nize osx using a kernel module later, so I don't have >> control over the physical layout. >> >>> Is it valid and useful to issue sc from privileged mode anyway, >>> except for calling the hypervisor? >> >> Same as a syscall on x86 really. The kernel can and does issue >> syscalls within itself. >> >> > > I don't believe we support the kernel actually doing a syscall to itself > anymore, at least on powerpc. The callers call the underlying system > call function, or kernel_thread. > > That said, I would suggest we allocate a syscall number for this, as it > would document the usage. (In additon to 0..nr_syscalls - 1 we have > 0x1ebe in use). That's actually a pretty good idea. > > Also, is there any desire to nest such emulation? Nesting should just work, right? Since we only accept hypercalls from PR=0 and guests run in PR=1, we get the sc interrupt in the l1 guest by then. The only issue I'm aware of that completely breaks when using nested KVM on PPC is the MSR_IR != MSR_DR logic. We fetch the instruction we got an interrupt on for certain interrupts in the world switch handler by keeping MSR_IR=0, but setting MSR_DR=1. And KVM speeds up MSR_DR != MSR_IR by mapping both of them lazily in a special address space. So if you access the same page as instruction and as data, you get an invalid result. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html