On Tue, May 26, 2015 at 02:25:52PM +0100, Maciej W. Rozycki wrote: > > > > - tlb_read_hazard > > > > Between tlbr and mfc0 (various TLB registers). This is copied from > > > > tlbw_use_hazard in all cases on the assumption that tlbr has similar > > > > data writer characteristics to tlbw, and mfc0 has similar data user > > > > characteristics to loads and stores. > > > > > > Be careful with this assumption, it does not stand for R4600/R4700 and > > > R5000 processors (4 vs 3 intervening instructions), you need an extra NOP > > > for them. Likewise there is a difference with the 5K (1 vs 0 intervening > > > instructions), but it's already buried in our pessimistic barrier that > > > assumes 4 intervening instructions. > > > > The TLB write hazard is 4 cycles on the 8 stage R4000 pipeline but 2 cycles > > on the R4600 pipeline. > > I misinterpreted the numbers in the table for the R4600/R4700/R5000, > sorry. It gives pipeline stage numbers rather than instruction counts, > unlike the 5K and some other tables. > > The difference is still there however: for TLBW_/use it's (3 - 2 - 1) => > 0 and for TLBR/MFC0 it's (4 - 2 - 1) => 1 (there's a one-cycle slip for > TLBW_ instructions causing it). We use 3 NOPs for this variant, so it'll > be covered. > > > We handle this in a particularly non-obvious but > > optimized way by exploiting the fact that the R4000 pipeline kills two > > instructions following the branch delay slot like: > > > > .set noreorder > > MTC0 $reg, c0_sometlbregister > > B 1f > > 1: NOP > > TLBW > > > > where the branch-nop sequence will cost 4 cycles on the R4000's eight-stage > > pipeline but only two on the R4600 pipeline. > > And this code is where? ISTR seeing it before, but now all I can see in > <asm/hazards.h> for the R4k and friends is: > > #define __tlbw_use_hazard \ > nop; \ > nop; \ > nop > > It does not cover the 4-instruction hazard of the original R4000 even, so > it looks like it has to be fixed, perhaps by using code like you quoted. Sorry for reciting kernel code from memory :-) I haven't checked a current kernel before posting. If it's been removed then this there might be a bug or at least something that can be optimized for the R4000/R4000. Ralf