Peter Arremann <loony@xxxxxxxxxxxx> wrote: > The suggestion of 8 was made mostly because there was no > larger x86-64 platform available at that time. Opteron 8xx processors are so named because 8-way is the maximum number of Opteron processors with 3 HyperTransport links so no other Opteron is more than 2 hops away. With more than 8-way, you start to run into excessive hops, and that requires further design considerations in both hardware and software. I know many vendors are selling "scalable" 4-way Socket-940 boards these days with two 3.2-8.0GBps HyperTransport connectors for daisy chaining mainboards. But the HyperTransport eXpansion (HTX) is now the preferred way to build clusters of 4-way Socket-940 boards, and each system has its own OS. Infiniband over HTX is capable of a "real world" 1.8GBps -- over 100% faster than "real world" performance of Infiniband PCI-X 2.0 cards (typically used in Xeon/Itanium). > Also, the main reason for the limit is scalability. With > more cpus comes more communications overhead, more > congestion on the bus, There is _no_ "bus" in Opteron. Yes, Opterons will "share" HyperTransport links when they cannot directly connect to > less memory bandwidth for each cpu and so on. Okay, this is _misleading_. You're thinking Intel SMP. Opterons _always_ have 128-bit of DDR (2 channels) per CPU. Opteron uses NUMA (and HyperTransport partial meshes for CPU-I/O). There is _no_ "less memory bandwidth for each cpu". That is a trait of Intel SMP [A]GTL+, not AMD NUMA/HyperTransport. Yes, if the Opteron has to access memory over on another CPU, then that is a performance issue. If the other CPU is on another mainboard, then yes, contention can happen there. > I remember in the good old days when smp was first added, > to the kernel, people said 2cpu was the max you can have... I remember non-Linux/non-PC where MP, not SMP, was used. >From true crossbar switches (not "bus hubs") to the partial mesh we now have in the Opteron. In fact, it's one of the areas where Linux is very immature. It's logic is still very SMP, and only has NUMA "hints," and does not scale well on a NUMA platform, let alone the partial mesh of the Opteron 800's 2xDDR/3xHyperTransport _per_ CPU. Especially when it comes to processor affinity for I/O. It's a crapload better than NT, but not many UNIX implementations. Sun's support of the Opteron then became a no-brainer. They could deliver a partial-mesh platform at a commodity cost. > we ran a few 4way systems back then very effectivly simply > because our application had only a low volume of > communications. But you're still accessing memory. I assume it was an Intel SMP solution, and therefore, had memory access limitations you describe. These are still wholly _inapplicable_ to Opteron if you have an application and operating system that are effective at processor affinity for processes. And when it comes to communication, processor affinity for I/O can do wonders -- but _only_ on Opteron, not even proprietary Xeon MP / Itanium systems (because they are still "Front Side Bottleneck" designs). I understand what you're trying to say. But it's not very applicable to Opteron in the least bit. -- Bryan J. Smith | Sent from Yahoo Mail mailto:b.j.smith@xxxxxxxx | (please excuse any http://thebs413.blogspot.com/ | missing headers)