Re: Unexpected data TLB miss happens when guest OS executing a "bl" instruction

Jimi Xenidis <jimix@xxxxxxxxx> · Mon, 2 Jul 2012 14:03:56 -0500

On Jul 2, 2012, at 10:34 AM, Fei K Chen wrote:

> We are debuging kvm on IBM poweren chip by RSICWatch tool.

So for those of you who don't know,  the "RISCWatch tool" (RW) is a JTAG based HW probe debugger.

> An unexpected data TLB miss happened and we can not explain why. Any one have met this before?
> 
> 1. Guest OS executes a "bl" instruction with PC=0xC0000000005A49CC. According to the guest linux kernel objdump file, the next instruction will be "mflr r0" with PC=0xC000000000599CC0.

Normally, if you were debugging the host, you could strip off the 0xC and read the location from memory to verify this.
However, since you are debugging the guest, you probably can't figure out the machine physical address.

> 
> 2. By single-step execution in RISCWatch, guest OS does jump to an instruction with PC=0xC000000000599CC0. At this time, RISCWatch tool can not display what the instruction is. We guess this is because there is no instruction TLB entry in hardware TLB for PC=0xC000000000599CC0. Thus an instruction TLB miss is expected if we press the "Asmstep" to execute the next instruction.
> 
> 3. Unfortunately, poweren jumps an instruction with PC=0xC000000000051FF4 which is the beginning of data TLB miss entry in kvm. We read the values in spr SRR0 and DEAR. Both of them are 0xC000000000599CC0. We even can not imagine why this happens.

So when RW decides to look at memory it uses "instruction stuffing/ramming" where it is able to insert instructions into the thread's instruction "port",  this way it can perform loads and stores using translation in the same way that software does.
Since you are using the ASM window, it is trying to read the instruction (before it is executed) and this causes your data fault.
So this is normal and completely expect.

Instead of using the ASM window use the "command line window" to "step" and "read iar" then you will get the correct instruction fault you are looking for.
Note: It may also be the case that you can uncheck  the "track IAR" box in the asm windows if you insist on doing that.

> 
> 4. As external interrupt will happen during single-step debugging, we set a hardware breakpoint at PC=0xC000000000599CC0, and let poweren directly run to that point.

Yes, hardware probes become difficult with any interrupts active and, IMNSHO, should really only be used to debug bootstrap or exception level code.
It would be way easier to instrument the host fault handlers to help you debug this case.

> 
> 5. When poweren stops at PC=0xC000000000599CC0, from the output of RISCWatch, a "trap" instruction is placed at PC=0xC000000000599CC0. It is different with what should be according to the kernel objdump file. The only explanation we can imagine is that our kvm code set a wrong TLB entry for PC=0xC000000000599CC0 (it may be brought by that unexpected data TLB miss).

As explained above, I'm pretty sure you did not hit the data fault in the same way as before.
Does the rest of the instruction stream match? If not then you likely have a translation error.
However, there are only a handful of static "trap" instructions in vmlinux, so you should be able to track them down.
My bet is that some software (or RW) has inserted the trap instruction to facilitate some form of break point?

-jx

> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html