Re: [PATCH 1/5] KVM: PPC: Book3S HV: Handle guest-caused machine checks on POWER7 without panicking

Alexander Graf <agraf@xxxxxxx> · Mon, 26 Nov 2012 14:15:16 +0100

On 23.11.2012, at 22:42, Paul Mackerras wrote:

> On Fri, Nov 23, 2012 at 03:13:09PM +0100, Alexander Graf wrote:
>> 
>> On 22.11.2012, at 10:25, Paul Mackerras wrote:
>> 
>>> +	/* Do they have an SLB shadow buffer registered? */
>>> +	slb = vcpu->arch.slb_shadow.pinned_addr;
>>> +	if (!slb)
>>> +		return;
>> 
>> Mind to explain this case? What happens here? Do we leave the guest with an empty SLB? Why would this ever happen? What happens next as soon as we go back into the guest?
> 
> Yes, we leave the guest with an empty SLB, the access gets retried and
> this time the guest gets an SLB miss interrupt, which it can hopefully
> handle using an SLB miss handler that runs entirely in real mode.
> This could happen for instance while the guest is in SLOF or yaboot or
> some other code that runs basically in real mode but occasionally
> turns the MMU on for some accesses, and happens to have a bug where it
> creates a duplicate SLB entry.

Is this what pHyp does? Also, is this what we want? Why don't we populate an #MC into the guest so it knows it did something wrong?

Alex

> 
>>> +	/* Sanity check */
>>> +	n = slb->persistent;
>>> +	if (n > SLB_MIN_SIZE)
>>> +		n = SLB_MIN_SIZE;
>> 
>> Please use a max() macro here.
> 
> OK.
> 
>>> +	rb = 0x800;		/* IS field = 0b10, flush congruence class */
>>> +	for (i = 0; i < 128; ++i) {
>> 
>> Please put up a #define for this. POWER7_TLB_SIZE or so. Is there any way to fetch that number from an SPR? I don't really want to have a p7+ and a p8 function in here too ;).
>> 
>>> +		asm volatile("tlbiel %0" : : "r" (rb));
>>> +		rb += 0x1000;
>> 
>> I assume this also means (1 << TLBIE_ENTRY_SHIFT)? Would be nice to keep the code readable without guessing :).
> 
> The 0x800 and 0x1000 are taken from the architecture - it defines
> fields in the RB value for the flush type and TLB index.  The 128 is
> POWER7-specific and isn't in any SPR that I know of.  Eventually we'll
> probably have to put it (the number of TLB congruence classes) in the
> cputable, but for now I'll just do a define.
> 
>> So I take it that p7 does not implement tlbia?
> 
> Correct.
> 
>> So we never return 0? How about ECC errors and the likes? Wouldn't those also be #MCs that the host needs to handle?
> 
> Yes, true.  In fact the OPAL firmware gets to see the machine checks
> before we do (see the opal_register_exception_handler() calls in
> arch/powerpc/platforms/powernv/opal.c), so it should have already
> handled recoverable things like L1 cache parity errors.
> 
> I'll make the function return 0 if there's an error that it doesn't
> know about.
> 
>>> 	ld	r8, HSTATE_VMHANDLER(r13)
>>> 	ld	r7, HSTATE_HOST_MSR(r13)
>>> 
>>> 	cmpwi	r12, BOOK3S_INTERRUPT_EXTERNAL
>>> +BEGIN_FTR_SECTION
>>> 	beq	11f
>>> -	cmpwi	r12, BOOK3S_INTERRUPT_MACHINE_CHECK
>>> +END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206)
>> 
>> Mind to explain the logic that happens here? Do we get external interrupts on 970? If not, the cmpwi should also be inside the FTR section. Also, if we do a beq here, why do the beqctr below again?
> 
> I was making it not call the host kernel machine check handler if it
> was a machine check that pulled us out of the guest.  In fact we
> probably do still want to call the handler, but we don't want to jump
> to 0x200, since that has been patched by OPAL, and we don't want to
> make OPAL think we just got another machine check.  Instead we would
> need to jump to machine_check_pSeries.
> 
> The feature section is because POWER7 sets HSRR0/1 on external
> interrupts, whereas PPC970 sets SRR0/1.
> 
> Paul.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html