On Tue, 26 May 2015, Ralf Baechle wrote: > > > - tlb_read_hazard > > > Between tlbr and mfc0 (various TLB registers). This is copied from > > > tlbw_use_hazard in all cases on the assumption that tlbr has similar > > > data writer characteristics to tlbw, and mfc0 has similar data user > > > characteristics to loads and stores. > > > > Be careful with this assumption, it does not stand for R4600/R4700 and > > R5000 processors (4 vs 3 intervening instructions), you need an extra NOP > > for them. Likewise there is a difference with the 5K (1 vs 0 intervening > > instructions), but it's already buried in our pessimistic barrier that > > assumes 4 intervening instructions. > > The TLB write hazard is 4 cycles on the 8 stage R4000 pipeline but 2 cycles > on the R4600 pipeline. I misinterpreted the numbers in the table for the R4600/R4700/R5000, sorry. It gives pipeline stage numbers rather than instruction counts, unlike the 5K and some other tables. The difference is still there however: for TLBW_/use it's (3 - 2 - 1) => 0 and for TLBR/MFC0 it's (4 - 2 - 1) => 1 (there's a one-cycle slip for TLBW_ instructions causing it). We use 3 NOPs for this variant, so it'll be covered. > We handle this in a particularly non-obvious but > optimized way by exploiting the fact that the R4000 pipeline kills two > instructions following the branch delay slot like: > > .set noreorder > MTC0 $reg, c0_sometlbregister > B 1f > 1: NOP > TLBW > > where the branch-nop sequence will cost 4 cycles on the R4000's eight-stage > pipeline but only two on the R4600 pipeline. And this code is where? ISTR seeing it before, but now all I can see in <asm/hazards.h> for the R4k and friends is: #define __tlbw_use_hazard \ nop; \ nop; \ nop It does not cover the 4-instruction hazard of the original R4000 even, so it looks like it has to be fixed, perhaps by using code like you quoted. Maciej