On 07.10.2012, at 15:26, Avi Kivity wrote: > On 10/07/2012 03:19 PM, Alexander Graf wrote: >> >> On 07.10.2012, at 15:13, Avi Kivity wrote: >> >>> On 10/07/2012 01:41 AM, Alexander Graf wrote: >>>> SPRs on PowerPC are the equivalent to MSRs on x86. They usually >>>> control behavior inside a core, so the best place to emulate them >>>> traditionally has been the kernel side of kvm. >>>> >>>> However, some SPRs should be emulated by user space. For example >>>> the DBCR0 register which is used for machine reset. Or the interrupt >>>> acknowledge register on e500 which is tightly integrated with the >>>> interrupt controller that lives in user space. >>>> >>>> So let's expose "unknown" SPR reads and writes to user space, so that >>>> it can handle them if it knows what's going on. >>>> >>>> As a nice side effect, we also get a lot better error reporting up >>>> to user space, since now we actually know when an SPR read/write failed. >>>> >>> >>> We have a similar problem with x86 MSRs. >> >> Yup. The new APIC MSR registers would also have the same problem, right? > > Which new APIC MSR registers? I thought x2apic can be accessed through MSRs? If you want to emulate that in user space, you need something similar. > >> >>> >>>> >>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt >>>> index e726d76..7a35c64 100644 >>>> --- a/Documentation/virtual/kvm/api.txt >>>> +++ b/Documentation/virtual/kvm/api.txt >>>> @@ -2253,6 +2253,32 @@ The possible hypercalls are defined in the Power Architecture Platform >>>> Requirements (PAPR) document available from www.power.org (free >>>> developer registration required to access it). >>>> >>>> + /* KVM_EXIT_SPR */ >>>> + struct { >>>> + __u64 sprn; >>>> + __u64 data; >>>> + __u64 msr; >>>> + __u8 is_write; >>>> +#define SPR_STATUS_OK 0 >>>> +#define SPR_STATUS_FAIL 1 >>>> + __u8 status; >>>> + } spr; >>>> + >>>> +This is used on PowerPC for Special Purpose Register emulation that >>>> +the kernel can not deal with. >>>> + >>>> +It occurs when the guest triggers an mtspr or mfspr instruction on >>>> +an SPR that is not handled by kvm's SPR emulation code. In these >>>> +cases, 'sprn' contains the SPR ID. That ID is target CPU specific. >>>> +'data' contains the value to write to the SPR when 'is_write'==1 (mtspr) >>>> +or is used as result buffer for 'is_write'==0 (mfspr). Status is used >>>> +to tell the kernel that an SPR read/write was successful. It is set to >>>> +SPR_STATUS_OK by default. If user space fails to emulate an SPR access, >>>> +it should set it to SPR_STATUS_FAIL, so that the kernel can inject >>>> +an exception into the guest context. The field 'msr' contains the MSR >>>> +register state at the point of time the SPR read/write occured. It can >>>> +be used by user space for permission checks. >>>> + >>> >>> Since this happens in the middle of instruction emulation, the same >>> rules should apply. Userspace must reenter kvm with the response to the >>> instruction, and can force an immediate exit by queueing an unmasked signal. >> >> Ah, yes. I forgot to add this exit to that section of the spec. >> >>> What happens when a future kvm starts emulating an SPR that was >>> previously emulated in userspace? >> >> That depends on a case-by-case basis. >> >> a) SPR is emulated in kernel space because the device is emulated in kernel space. >> >> This is the typical in-kernel irqchip case. We just only enable SPR traps for those SPRs in the kernel if the in-kernel irqchip is in use. >> >> b) Some bits need to be emulated by kernel space instead because they change vcpu behavior >> >> That one is more tricky. I would assume you could handle the few bits that change vcpu behavior in kernel space, then fail the instruction emulation and pass the rest on to user space as you did before for the rest of the bits. >> >> Is there any other case? I can't think of any OTOH :). >> > > An SPR becomes heavily used by a guest, and there is therefore pressure > to emulate it in the kernel in order to improve performance. Then you enable a CAP to have it enabled in kernel space and thus user and kernel space know about it. > The downside of this generic approach is that it prepares suprises down > the road. The alternative approach, of adding a new KVM_EXIT_RESET, > avoids this minefield, but requires ABI changes every time we want to > emulate something in userspace. Can you provide a critique of this > alternate approach? Yeah, it doesn't scale as well. The SPR read/write give us all information we need to emulate other registers too, like the magical "read this SPR and automatically get the interrupt vector from the MPIC and ack the interrupt along the way" register we have on e500. We'd have to add a new exit for that one as well. And for the next, and the next. Plus, today we don't get good error messages when we fail an SPR read/write. With this approach, you do. And you can potentially configure whether you want to ignore unknown SPRs on a per-VM basis. Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html