> I attached the test case. Untar it. Type 'make' and run 'a.out'. > > If the test fails you will see a print-out. Otherwise you see nothing. > > It does not always fail. But if it fails, it is usually pretty consistent. > Try a few times. Moving source tree to a different directory may cause > the symptom appear or disappear. > > I spent quite some time to trace this problem, and came to suspect > there might be a hardware problem. > > The problem involves emulating a "lw" instruction in cp1 branch delay > slot, which needs to set up trampoline in user stack. The net effect > looks as if the icache line or dcache line is not flushed properly. > > Using gdb/kgdb, printf or printk in any useful places would hide the bug. > > I did find a smaller part of the problem. flush_cache_sigtramp for > MIPS32 (4Kc) calls protected_writeback_dcache_line in mips32_cache.h. > It uses Hit_Writeback_D, and the 4Kc mannual says it is not implemented > and executed as no-op (*ick*). Which version of the 4Kc manual are you looking at? I'm looking at a very recent version of the 4Kc Software User's Manual (version 1.17, dated September 25, 2002), and it only shows Hit_Writeback_D to be invalid for *secondary and teritary* caches, which makes sense, since the 4KSc doesn't have any. > Even after fixing this, I still see the problem happening. That's not too surprising. The 4Kc D-cache is write-through, so if you're really seeing a problem with trampolimes, it is almost certain to be a problem with the Icache invalidation, not the Dcache flush. > If you replace flush_cache_sigtramp() with flush_cache_all(), symptom > would disppear. Which again would make sense if there's a problem on the icache side of the flush. Oddly enough, we've seen some glitches on other CPUs with other kernels that might have been explicable by failures of protected_flush_icache_line(), but we never found a problem with it, and a higher-level memory management patch made the problem go away. Makes me wonder if we shouldn't look at it again, more closely. Is there any possibility that the logic for restarting a protected kernel access following a page fault will somehow screw up on CACHE instructions, as opposed to the loads and stores for which the code was originally written? > Several of my tests seem to suggest it is the icache that did not > get flushed (or updated) properly. > > Not re-producible on other MIPS boards. At least so far. > > Does anybody with more knowledge about 4Kc have any clues here? > > Thanks. > > Jun