Re: Offtopic Posts [was Re: [OT] Memory Models and Multi/Virtual-Cores -- WAS: 4.0 -> 4.1 update failing]

b.j.smith at ieee.org (Bryan J. Smith) · Wed Jun 29 01:59:22 2005

On Tue, 2005-06-28 at 21:26 -0400, Peter Arremann wrote:
> They all have 0 to do with the problem?
> What kind of document would you accept? 

One that you will _not_ find on developer.intel.com.

Intel will _not_ tell you how to hack the Athlon MP so address PAE36
linearly at the TLB, because it's processors can't do it prior to the
Xeon MP with EM64T.

> Alright - tell me what you want to see :-)

It's somewhere in the AMD system developer manuals, possibly ones not
publicly available.

We're not talking board-level and we're not talking programmer either.
We're talking about throwing the so-called "32-bit" Athlon [MP] in a
mode that _breaks_ GTL, something that would _not_ normally be
supported.

> Thank you - 32bit only please... we all agree that AMD64 can address
> more than 4GB without issues. 

This is a hack for so-called "32-bit" Athlon MP mainboards!

The so-called "32-bit" Athlon and so-called "64-bit" Athlon 64 / Opteron
are of the _same_core_ design for the _same_ platform, EV6.  I posted on
the 3-generation "heritage" of Intel (i386-486, Pentium, PPro-P4) and
AMD (386-486, Nx586-686/K5-K6, Athlon-Opteron).  You don't design cores
"on-a-dime," but for 5-7 years of lifespan (although the PPro is really
old, largely because Itanium _was_ Intel's 4th gen design!).

Athlon 64 / Opteron just moves more of the "traditional northbridge"
into the CPU, doubles the XMM registers and makes the ALU fully 64-bit.
Most of the "northbridge" changes were already in the so-called "32-bit"
Athlon, because EV6 is a 3-16 point _crossbar_ switch, not a "hub" like
Intel.  Because the CPUs in Athlon MP talk over _separate_, _switched_
interconnects, they must have some management units in the processors,
not the "single point-of-contention chipset."

A64/Opteron merely turns this into a "partial mesh" instead of a "single
switch."  I.e., instead of "switching" in the "single chipset," you now
"switch" in the individual CPUs.  The CPUs _always_ acted "independent"
-- even in Athlon MP, right into the EV6 switch.

The addressing is still 100% the same!  Even the addressing registers --
16-bit segment + 32-bit offset are the _same_!  There is just now an
official memory model called "Long Mode" -- the segment register becomes
bits 32-47.  In PAE36, the segment register is bits 4-36, which bits
4-31 being a "two's complement" with the offset register.

Now that's just the "programmer" level.

GTL was built for 32-bit.  It made _no_sense_ for Intel to modify GTL
until recently, because the underlying PAE36 model required paging in
the OS anyway.  I.e., why add all the logic to do linear addressing in
GTL beyond 32-bit if there was no OS to do it?!?!?!  Besides, IA-64 was
the future, right?

[ Interconnects and memory addressing are _not_ things you can "do on a
dime."  It took Intel years to develop GTL, and Digital years to develop
EV6.  And it took years for AMD to adopt EV6 for GTL compatibility. ]

Athlon, including 32-bit Athlon, was AMD's first design that was _not_
GTL compatible at all!  That means AMD had to add all sorts of GTL
compatibility in to the chipset, CPU, etc...  Since they already had a
40-bit interconnect anyway, they decided to support legacy PAE36 GTL as
well as 32-bit GTL.  That way it could use legacy OSes up to 64GiB.
When these legacy PAE36 OSes run, they use Athlon MP in the same way
Intel does above 4GiB, paging.

That was _until_ this "hack."  It requires the BIOS to setup the EV6
interconnect in a way that _breaks_ GTL.  That means the OS has got to
know how to use it.  Athlon MP mainboards with this hack are _rare_ (I'm
still trying to find the e-mail which has this short-list).

Now that x86-64 is here, Intel was _finally_ given a reason to make GTL
work above 4GiB.  They have now done so in the new 40-bit implementation
that Xeon MP uses.  Linux/x86-64 takes advantage of this.

But on Linux/x86, when you break 4GiB, the paging must accommodate.
What Intel doesn't have on its GTL/x86 that AMD/x86 does is a native,
linear 40-bit TLB capability.  Again, for Intel, it would have been a
waste of transistors, because paging is how a 32-bit OS _must_ work for
PAE36 -- or so it seemed.

On AMD, they already had >32-bit to support the EV6 interconnect.  EV6
was _not_ designed for x86, but AXP.  _All_ EV6 components are 40-bit
compatible, they have to be for the specification, including even the
so-called "32-bit" Athlon interconnect logic.

AMD had to add logic to support for GTL.  They added PAE36 because they
already had the address space to spare.  _Every_other_ x86/PAE36 OS uses
it with paging.  This Linux hack is aware that the core TLB is designed
for _linear_ >32-bit, when the hardware must be configured in such a way
that is _completely_incompatible_ with GTL, including PAE36.

Again, I'm waiting on the technical information from a foremost Linux
source at AMD.  He'll understand it better than I.

-- Bryan

P.S.  There is this farce out there that AMD64 allows 64-bit addressing.
It does _not_.  It allows PAE52/4PiB, PAE36/64GiB and 32-bit/4GiB
programmatic-virtual, 48-bit/256TiB programmatic-physical and 40-
bit/1TiB interconnect-linear.  When running a PAE36 OS, it will linearly
address up to 36-bit/64GiB, using the native, linear EV6 interconnect.
That was (essentially) backported with this hack to so-called "32-bit"
Athlon, and implemented on a handful of Athlon MP mainboards.  Intel
x86/GTL+ is _not_ capable of this, because it _violates_ how the CPU GTL
+ talks to the MCH in Intel's own specs -- _until_ EM64T came out (and
even then there are still some issues prior to the new 40-bit Xeon MP).

P.S.S.  The i486 TLB has _always_ been capable of 48-bit/256TiB "virtual
addressing."  It was just always "normalized" into 32-bit physical
addresses.  PAE36 just normalizes them into 36-bit physical addresses,
although the PAE36 OSes still use a 32-bit offset register, which
requires the "paging."  What this "hack" does is take advantage of a
non-GTL compatible mode of the Athlon, just like the A64/Opteron, and
avoids paging at the TLB (IIRC).

-- 
Bryan J. Smith                                     b.j.smith@xxxxxxxx 
--------------------------------------------------------------------- 
It is mathematically impossible for someone who makes more than you
to be anything but richer than you.  Any tax rate that penalizes them
will also penalize you similarly (to those below you, and then below
them).  Linear algebra, let alone differential calculus or even ele-
mentary concepts of limits, is mutually exclusive with US journalism.
So forget even attempting to explain how tax cuts work.  ;->