Florian, Could you do me a huge favor and try a build that uses 3 or 4 nops instead of the branch to the instruction after the delay slot? There was a reason that I eliminated the branch construct from the MIPS internal Linux source base - it's a hack that works perfectly on R4000's, but it's pretty much a coincidence that it does so. Yes, the code fragment in question is R4K-specific, but we really need to migrate towards the use of consistent mechanisms that work across the full range of MIPS CPUs. Ideally, *all* CP0 hazards should some day be padded out with "ssnops" (sll $0,$0,1, if I recall), which force a 1 cycle delay per instruction even on superscalar MIPS CPUs. Kevin K.