Re: FPU emulator unsafe for SMP?

"Kevin D. Kissell" <kevink@mips.com> · Wed, 20 Feb 2002 11:14:02 +0100

Daniel Jacobowitz wrote:
> On Tue, Feb 19, 2002 at 08:24:34PM -0800, Jun Sun wrote:
> > On Tue, Feb 19, 2002 at 10:28:35PM -0500, Greg Lindahl wrote:
> > > 
> > > Alpha seems to always save the fpu state (the comments say that gcc
> > > always generates code that uses it in every user process.)
> > 
> > I think the comment might be an execuse. :-)  Never heard of gcc
> > generating unnecessary floating point code.

It ain't gcc, it's glibc.  And it ain't just on the Alpha, just about
every MIPS process has FP state, even those who do not
declare a single FP variable.  However that's not a real
justification for whether or not one does lazy FPU context
management.  See below...

> I have :)  It may do memory moves in them, for instance.  Not sure if
> that makes sense on Alpha.

It probably does on one implementation or another.
We used the same trick back in the 1980's in libc
for the Fairchild Clipper, since it allowed better
parallelism between address computation and
memory operations.  Not only for memory moves,
but string operations!

> > > I suspect that the optimization of not saving the fpu state for a
> > > process that doesn't use the fpu is the most critical optimization.
> > > And that you do already.

Let me rephrase that - the advantage is of not saving *or restoring*
the FPU state for a process that isn't using the FPU *in its current
time slice*.

> > If you do use floating point, I think it is pretty common to have
> > only process that uses fpu and runs for very long.  In that case,
> > leaving FPU owned by the process also saves quite a bit.

One cannot make design decisions based on what one
"thinks is pretty common".   Binding threads to CPUs
(CPU affinity) is almost always more efficient when
the behavior of the workload looks like batch FORTRAN
processing.   It's when one gets a mix of computational
and interactive jobs that it often creates unfortunate
artifacts, and thus must be handled with care.

> Not true.  For instance, on a processor with hardware FPU, setjmp()
> will save FPU registers.  That means most processes will actually end
> up taking the FPU at least once.

Almost all MIPS/Linux threads, from init() onward, have FPU state, 
due to setjmp(), printf() (which uses the FP registers even
if one does not specify a floating point data item or format), etc.

> The general approach in Linux is to disable lazy switching on SMP.  I'm
> 95% sure that PowerPC does that.

Has anyone ever measured the performance impact of
lazy FPU context switching on MIPS?   It's one of those
ideas that was trendy in the 1980's, but I recall that when
we implemented it  for SVR2 on the Fairchild Clipper 
(which had only 16 FP registers), the measured improvement 
on average context switch time was tiny - a percent or so.
We left it in, because it worked and it *was* an improvement,
but we would never have gone through the hassle had we
known how little it would buy us.

It occurs to me that we can to some degree "split
the difference" on FPU context management for
SMP if we *always* save the FPU state when a
thread switches out, but preserve the logic that
schedules threads with CU1 inhibited so that the
context is only *loaded* if the thread executes
FP instructions.  That would save about half of
the context switch overhead for non-FP-intensive
threads, while eliminating the migration problem.

            Regards,

            Kevin K.