Daniel Jacobowitz wrote: > On Tue, Feb 19, 2002 at 08:24:34PM -0800, Jun Sun wrote: > > On Tue, Feb 19, 2002 at 10:28:35PM -0500, Greg Lindahl wrote: > > > > > > Alpha seems to always save the fpu state (the comments say that gcc > > > always generates code that uses it in every user process.) > > > > I think the comment might be an execuse. :-) Never heard of gcc > > generating unnecessary floating point code. It ain't gcc, it's glibc. And it ain't just on the Alpha, just about every MIPS process has FP state, even those who do not declare a single FP variable. However that's not a real justification for whether or not one does lazy FPU context management. See below... > I have :) It may do memory moves in them, for instance. Not sure if > that makes sense on Alpha. It probably does on one implementation or another. We used the same trick back in the 1980's in libc for the Fairchild Clipper, since it allowed better parallelism between address computation and memory operations. Not only for memory moves, but string operations! > > > I suspect that the optimization of not saving the fpu state for a > > > process that doesn't use the fpu is the most critical optimization. > > > And that you do already. Let me rephrase that - the advantage is of not saving *or restoring* the FPU state for a process that isn't using the FPU *in its current time slice*. > > If you do use floating point, I think it is pretty common to have > > only process that uses fpu and runs for very long. In that case, > > leaving FPU owned by the process also saves quite a bit. One cannot make design decisions based on what one "thinks is pretty common". Binding threads to CPUs (CPU affinity) is almost always more efficient when the behavior of the workload looks like batch FORTRAN processing. It's when one gets a mix of computational and interactive jobs that it often creates unfortunate artifacts, and thus must be handled with care. > Not true. For instance, on a processor with hardware FPU, setjmp() > will save FPU registers. That means most processes will actually end > up taking the FPU at least once. Almost all MIPS/Linux threads, from init() onward, have FPU state, due to setjmp(), printf() (which uses the FP registers even if one does not specify a floating point data item or format), etc. > The general approach in Linux is to disable lazy switching on SMP. I'm > 95% sure that PowerPC does that. Has anyone ever measured the performance impact of lazy FPU context switching on MIPS? It's one of those ideas that was trendy in the 1980's, but I recall that when we implemented it for SVR2 on the Fairchild Clipper (which had only 16 FP registers), the measured improvement on average context switch time was tiny - a percent or so. We left it in, because it worked and it *was* an improvement, but we would never have gone through the hassle had we known how little it would buy us. It occurs to me that we can to some degree "split the difference" on FPU context management for SMP if we *always* save the FPU state when a thread switches out, but preserve the logic that schedules threads with CU1 inhibited so that the context is only *loaded* if the thread executes FP instructions. That would save about half of the context switch overhead for non-FP-intensive threads, while eliminating the migration problem. Regards, Kevin K.