On Tue, 22 Jan 2002, Tommy S. Christensen wrote: > Well, why not use the stack? > > I am not quite familiar with the requirements on this "thread register", > but couldn't something like this be made to work: > #define TID *((sp & ~(STACK_SIZE-1)) + STACK_SIZE - TID_OFFSET) Last time I looked at how pthreads worked it did use the stack pointer to decide what the TID is. It got rather ugly because the stack on thread 0 was not under program control, so it had all sorts of unknown properties. But that could be fixed with kernel support I think. The only reason I can think of to have a *fast* thread-local variable is to implement thread-local storage. This is a good thing for glibc and multi-threaded programs - the ultimate implemenation would probably be to have gcc know about it (if ia64 has dedicated hardware, it is not unimaginable, and other compilers do implement this) extern int errno __attribute__((thread_local)); On i386 this has often been done using fs/gs to point to a block of ram. However, I expect you could probably also base the thread-local ram on the top/bottom of the stack which means each procedure can compute the (constant!) base in a couple of instructions. The runtime can know how much to set aside before it begins executing the new thread. Aligning SP can be done in a kernel independent way for tid 0. I don't know if this is worse than making the TLB handler slower to free up k0/k1, it entirely depends how many functions will be using thread local stuff.. Jason