Re: Extreme system overhead on large IP27

Ralf Baechle <ralf@xxxxxxxxxxxxxx> · Thu, 26 Oct 2006 17:50:28 +0100

On Thu, Oct 26, 2006 at 03:51:35PM +0200, Kevin D. Kissell wrote:

> I don't see what's different here than in any other SMP case.

It just happened to be a coonfiguration which happened to trigger the
issue.  But the underlying problem could exist on any other SMP system
using per-processor timers.

>  Is it really
> true that the MIPS SMP support *requires* that all CPUs in the system
> come out of reset on the same clock, with the same value in Count?

There isn't even an requirement to use the cp0 counter at all.  It just
happens to be that the VSMP kernel is using that timer.  It also happens
to be quite a logic choice on the Malta where the alternative would be
specific to one of the several system controllers.

SGI systems are infamous for potencially using mixed spec CPUs from the
same family.  That includes different clock speeds; something like having
180MHz R10000 and 500MHz R14000 would be possible.  The only sane cure for
the time code in such cases is avoiding c0_count and relying on some other
system-wide time source.  The same is may be needed in case of variable
CPU clock.

That said, Linux doesn't care just need a little bit of glue code to deal
with arbitrary timers.

> I find that very surprising (and a little disappointing).  Is this a general
> limitation of Linux? MIPS32/MIPS64 PRAs call out the reset value
> of Count as being undefined, and chip specs for pre-MIPS32 CPUs
> like the R10000 and the R4400 do not call out any reset value for
> Count either.

The count / compare code is very much did originate on uniprocessor
systems and the sole thing it cares about is the speed the counter is
incrementing at, not the absolute value.

> If there's going to be skew between CPU clocks, all it really means
> is that one cannot directly compare timestamps generated by different
> CPUs.  At a given point in time, "How long will it be until you hit an 
> absolute Count value X?"  will have a slightly different answer on each CPU 
> if there is skew, but "What will the local Count value be N jiffies from now?"
> should be something that can be correctly calculated independently on each 
> node. Where are we depending on the former, and can that usage be converted
> into something more like the later?
>             Kevin K.

  Ralf