On Tue, 26 Aug 2014, Joshua Kinard wrote: > > This sound very unlikely as the CPU was primarily designed to run IRIX and > > SGI's systems were using 16k or even 64k page size. > > > > What userland are you running and how old is it? Are you seeing different > > results for 16k and 64k? > > o32 userland is the primary on both systems. However, the last SIGILL was > under the 64k PAGE_SIZE kernel inside of an n32 chroot compiling the 'boost' > package on the Octane, which I restarted that and it's not complained since. > Also got SIGILL on the 16k PAGE_SIZE kernel when I booted 16k PAGE_SIZE the > first time and ran 'ps'. Subsequent runs of 'ps' didn't reproduce the > error. Also saw SIGILLs in the bootlog of the 16k PAGE_SIZE kernel when > "rm" was ran once (couldn't reproduce) and when mdadm tried to put one of > the arrays back together. Subsequent runs using similar argument lines > don't reproduce once I got to a root shell. Such intermittent failures look to me remarkably like cache coherency problems e.g. D$ vs I$. You can try making cache invalidation harder, e.g. tweak all the writeback calls and invalidation calls so that they perform their operation on the whole cache rather than the requested range only and see if that makes things better. You may instead tweak the suspected calling site too, of course. Maciej