On Mon, Aug 23, 2004 at 07:12:57PM +0200, Ralf Baechle wrote: > Thiemo and have been compiling various pieces of code with different > gcc versions trying to find the best possible register for that purpose. > We used code bloat as (weak ...) indicator for register pressure. It > turned out that $t9 was the best choice for all tested compiler versions; > thanks to the much improved register allocation of newer gcc the choice > of a particular register made far less difference on recent compilers > than on older compilers. > > I've also implemented a fast system call for reading the thread registers. > Benchmarks did show that to have about half the latency of a regular > syscall; the hope was if gcc was doing clever optimization that overhead > would effectivly become zero. > > I was favoring this low-overhead syscall approach because it would avoid > the loss of a register thus leaving performance of non-threaded code > unchanged but other developers generally favor the permanent allocation > of $t9 as a thread register. Personally, I favor doing the low-overhead syscall for o32 and then moving to the new ABI that MIPS is talking about with a thread register. I'm not sure what to do about n32/n64. > Other crazy ideas did include a per-thread mapping containing the thread > pointer - and possibly more information in the future. Does MIPS have an efficient way to do this for SMP? > On the positive side if we had multiple register sets on a MIPSxx V2 > processor we could exploit that to get rid of this overheade and do > other nice optimizations for TLB reload also. Unfortunately these > register sets are optional feature of the architecture only. That's more or less what was talked about for ARM v6. -- Daniel Jacobowitz