On 27.04.2012, at 13:23, Alexander Graf wrote: > > On 27.04.2012, at 07:48, Paul Mackerras wrote: > >> On Thu, Apr 26, 2012 at 12:19:03PM +0200, Alexander Graf wrote: >> >>> So switch the code over to call into the Linux C handlers from C code, >>> speeding up everything along the way. >> >> I have to say this patch makes me pretty uneasy. There are a few >> things that look wrong to me, but more than that, it seems to me that >> there would be a lot of careful thought needed to make sure that the >> approach is bullet-proof. > > Yay, finally some review on it :). This method is currently used identically in booke hv, so everything we find broken here also applies there! > >> The first thing is that you are filling in the registers, and in >> particular r1, in a subroutine, so you are potentially making regs.r1 >> point to a stack frame that no longer exists by the time we look at it >> inside do_IRQ or timer_interrupt. So, for example, a stack trace >> could go completely off the rails at that point. Quite possibly gcc >> will inline the kvmppc_fill_pt_regs function, which would probably >> save you, but I don't think you should rely on that. > > Ugh. Right. > >> The second thing is, why do you save just r1, ip, msr, and lr? Why >> those ones and no others? A performance monitor interrupt might well >> decide to save all the registers away plus a stack trace, and to see >> all the GPRs as 0 could be very confusing. > > Well, any other state at that point is pretty useless, since we've long deferred from the original IP the interrupt arrived at. So if a perfmon module reads out other GPRs there, they are basically guaranteed to be useless anyway, no? > >> Thirdly, if preemption is enabled, it could well be that the interrupt >> will wake up a higher priority task which should run before we >> continue. On 64-bit you are probably saved by the soft_irq_enable >> calls, which will (I think) call schedule() if a reschedule is >> pending, but on 32-bit soft_irq_enable does nothing. > > At that point preemption is disabled. > >> Fourthly, as Ben said, you should be setting regs->trap. > > Yup :). Very good catch that one. > >> Have you measured a performance improvement with this patch? If so >> how big was it? > > Yeah, I tried things on 970 in an mfsprg/mtsprg busy loop. I measured 3 different variants: > > C irq handling: 1004944 exits/sec > asm irq handling: 1001774 exits/sec > asm + hsrr patch: 994719 exits/sec > > So as you can see, that code change does have quite an impact. But maybe the added complexity isn't worth it? Either way, we should try and find a solution that works the same way for booke and book3s - I don't want such an integral part to differ all that much. How about this patch? This at least gets rid of the hsrr performance penalty: diff --git a/arch/powerpc/kvm/book3s_segment.S b/arch/powerpc/kvm/book3s_segment.S index 6bae0a9..8b2fc66 100644 --- a/arch/powerpc/kvm/book3s_segment.S +++ b/arch/powerpc/kvm/book3s_segment.S @@ -198,6 +198,7 @@ kvmppc_interrupt: /* Save guest PC and MSR */ #ifdef CONFIG_PPC64 BEGIN_FTR_SECTION + mr r10, r12 andi. r0,r12,0x2 beq 1f mfspr r3,SPRN_HSRR0 @@ -317,23 +318,17 @@ no_dcbz32_off: * Having set up SRR0/1 with the address where we want * to continue with relocation on (potentially in module * space), we either just go straight there with rfi[d], - * or we jump to an interrupt handler with bctr if there - * is an interrupt to be handled first. In the latter - * case, the rfi[d] at the end of the interrupt handler - * will get us back to where we want to continue. + * or we jump to an interrupt handler if there is an + * interrupt to be handled first. In the latter case, + * the rfi[d] at the end of the interrupt handler will + * get us back to where we want to continue. */ - cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL - beq 1f - cmpwi r12, BOOK3S_INTERRUPT_DECREMENTER - beq 1f - cmpwi r12, BOOK3S_INTERRUPT_PERFMON -1: mtctr r12 - /* Register usage at this point: * * R1 = host R1 * R2 = host R2 + * R10 = raw exit handler id * R12 = exit handler id * R13 = shadow vcpu (32-bit) or PACA (64-bit) * SVCPU.* = guest * @@ -343,12 +338,26 @@ no_dcbz32_off: PPC_LL r6, HSTATE_HOST_MSR(r13) PPC_LL r8, HSTATE_VMHANDLER(r13) - /* Restore host msr -> SRR1 */ +#ifdef CONFIG_PPC64 +BEGIN_FTR_SECTION + andi. r0,r10,0x2 + beq 1f + mtspr SPRN_HSRR1, r6 + mtspr SPRN_HSRR0, r8 +END_FTR_SECTION_IFSET(CPU_FTR_HVMODE) +#endif +1: /* Restore host msr -> SRR1 */ mtsrr1 r6 /* Load highmem handler address */ mtsrr0 r8 /* RFI into the highmem handler, or jump to interrupt handler */ - beqctr + cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL + beqa BOOK3S_INTERRUPT_EXTERNAL + cmpwi r12, BOOK3S_INTERRUPT_DECREMENTER + beqa BOOK3S_INTERRUPT_DECREMENTER + cmpwi r12, BOOK3S_INTERRUPT_PERFMON + beqa BOOK3S_INTERRUPT_PERFMON + RFI kvmppc_handler_trampoline_exit_end: -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html