On Wed, Feb 20, 2002 at 01:08:30AM +0100, Kevin D. Kissell wrote: > It's gotta be done. I mean, the last I heard (which was a long > time ago) mips64 Linux was keeping the CPU node number in > a watchpoint register (or something equally unwholesome) and > using that value as an index into tables. NUMA nitpicking: cpu number != node number. We store the CPU number in the PTEBase field from bit 23 on in the c0_context register. This number is then used to index the pgd_current[] array to find the root of the page table tree. Having some extra on-die memory for such tiny bits of frequently accessed information on the CPU die would be really cool. I bet it would make a visible difference. > Sticking all the per-CPU > state in a kseg3 VM page which is allocated and locked at boot > time would be much cleaner and on the average probably quite > a bit faster (definitely faster in the kernel but to be fair one has > to factor in the increase in TLB pressure from the locked entry). The plan is actually to map 32-bit page tables into a flat array of 4mb in size and use one wired mapping for that. The other half of the TLB entry mapping the root would still be available. > But getting back to the original topic, there's another fun bug > waiting for us in MIPS/Linux SMP floating point that can't > be fixed as easly with VM slight-of-hand. Consider processes > "A" and "B", where A uses FP and B does not: A gets scheduled > on CPU 1, runs for a while, gets preempted, and B gets CPU 1. > CPU 2 gets freed, so A gets scheduled on CPU 2. Unfortunately, > A's FP state is still in the FP register set of CPU 1. The lazy FPU > context switch either needs to be turned off (bleah!) or be fixed > for SMP to handle the case where the "owner" of the FPR's > on one CPU gets scheduled on another. > > The brute force would be somehow to send an interrupt to the CPU > with the FP state that will cause it to cough it up into the thread context > area. One alternative would be to give strict CPU affinity to the thread > that has it's FP state on a particular CPU. That could complicate load > balancing, but might not really be too bad. At most one thread per CPU > will be non-migratable at a given point in time. In the above scenario, > "A" could never migrate off of CPU 1, but "B" could, and would > presumably be picked up by an idle CPU 2 as soon as it's time slice > is up on CPU 1. That will be less efficient than doing an "FPU shootdown" > in some cases, but it should also be more portable and easier > to get right. > > Does this come up in x86-land? The FPU state is much smaller > there, so lazy context switching is presumably less important. Yes, it's an issue also on x86-land. Since the i386 code stopped using task segments for context switching their whole context switching code has actually become reasonably sane and can be used as a template. In particular I like the fact that they got away without tons of CONFIG_SMP that used to live in their kernel fp code. Time to re-read the i386 code. Using an IPI for migration of an FP context between CPUs a really bad idea which may result in rather sucky worst case context switching times. One of the facts that many performace tradeoffs in the Linux world assume to be granted is blindingly fast context switch times. The number of SMP platforms is growing. I thought it's mindboggling but people are actually building SMP on a chip systems from cores that were designed for uniprocessing. I'd expect such systems to perform like the early SMPs from the 80's, that's not very much at all ... Ralf