Re: FPU emulator unsafe for SMP?

Ralf Baechle <ralf@oss.sgi.com> · Wed, 20 Feb 2002 14:03:41 +0100

On Wed, Feb 20, 2002 at 01:08:30AM +0100, Kevin D. Kissell wrote:

> It's gotta be done.  I mean, the last I heard (which was a long
> time ago) mips64 Linux was keeping the CPU node number in
> a watchpoint register (or something equally unwholesome) and
> using that value as an index into tables.

NUMA nitpicking: cpu number != node number.  We store the CPU number
in the PTEBase field from bit 23 on in the c0_context register.  This
number is then used to index the pgd_current[] array to find the root
of the page table tree.

Having some extra on-die memory for such tiny bits of frequently accessed
information on the CPU die would be really cool.  I bet it would make
a visible difference.

>  Sticking all the per-CPU
> state in a kseg3 VM page which is allocated and locked at boot
> time would be much cleaner and on the average probably quite
> a bit faster (definitely faster in the kernel but to be fair one has
> to factor in the increase in TLB pressure from the locked entry).

The plan is actually to map 32-bit page tables into a flat array of 4mb
in size and use one wired mapping for that.  The other half of the
TLB entry mapping the root would still be available.

> But getting back to the original topic, there's another fun bug
> waiting for us in MIPS/Linux SMP floating point that can't
> be fixed as easly with VM slight-of-hand.  Consider processes
> "A" and "B", where A uses FP and B does not:  A gets scheduled
> on CPU 1, runs for a while, gets preempted, and B gets CPU 1.
> CPU 2 gets freed, so A gets scheduled on CPU 2.  Unfortunately,
> A's FP state is still in the FP register set of CPU 1.  The lazy FPU
> context switch either needs to be turned off (bleah!) or be fixed
> for SMP to handle the case where the "owner" of the FPR's
> on one CPU gets scheduled on another.  
> 
> The brute force would be somehow to send an interrupt to the CPU 
> with the FP state that will cause it to cough it up into the thread context 
> area.  One alternative would be to give strict CPU affinity to the thread 
> that has it's FP state on a particular CPU.  That could complicate load 
> balancing, but might not really be too bad.  At most one thread per CPU 
> will be non-migratable at a given point in time.  In the above scenario, 
> "A" could never migrate off of CPU 1, but "B" could, and would 
> presumably be picked up by an idle CPU 2 as soon as it's time slice 
> is up on CPU 1.  That will be less efficient than doing an "FPU shootdown"
> in some cases, but it should also be more portable and easier 
> to get right.
> 
> Does this come up in x86-land?  The FPU state is much smaller
> there, so lazy context switching is presumably less important.

Yes, it's an issue also on x86-land.  Since the i386 code stopped using
task segments for context switching their whole context switching code
has actually become reasonably sane and can be used as a template.  In
particular I like the fact that they got away without tons of CONFIG_SMP
that used to live in their kernel fp code.  Time to re-read the i386 code.

Using an IPI for migration of an FP context between CPUs a really bad
idea which may result in rather sucky worst case context switching times.
One of the facts that many performace tradeoffs in the Linux world
assume to be granted is blindingly fast context switch times.

The number of SMP platforms is growing.  I thought it's mindboggling
but people are actually building SMP on a chip systems from cores that
were designed for uniprocessing.  I'd expect such systems to perform
like the early SMPs from the 80's, that's not very much at all ...

  Ralf