On 7/27/09 11:05 AM, "Dave Youatt" <dave@xxxxxxxxxxxxxxxxxxx> wrote: > On 01/-10/-28163 11:59 AM, Greg Smith wrote: >> On Tue, 21 Jul 2009, Doug Hunley wrote: >> > Also, and this is getting maybe too far off topic, beyond the buzzwords, > what IS the new "hyperthreading" in Nehalems? -- opportunistic > superpipelined cpus?, superscalar? What's shared by the cores > (bandwidth, cache(s))? What's changed about the new hyperthreading > that makes it actually seem to work (or at least not causes other > problems)? smarter scheduling of instructions to take advantage of > stalls, hazards another thread's instruction stream? Fixed > instruction-level locking/interlocks, or avoiding locking whenever > possible? better cache coherency mechanicms (related to the > interconnects)? Jedi mind tricks??? > The Nehalems are an iteration off the "Core" processor line, which is a 4-way superscalar, out of order CPU. Also, it has some very sophisticated memory access reordering capability. So, the HyperThreading here (Symmetric Multi-Threading, SMT, is the academic name) will take advantage of that processor's inefficiencies -- a mix of stalls due to waiting for memory, and unused execution 'width' resources. So, if both threads are active and not stalled on memory access or other execution bubbles, there are a lot of internal processor resources to share. And if one of them is misbehaving and spinning, it won't dominate those resources. On the old Pentium-4 based HyperThreading, was also SMT, but those processors were built to be high frequency and 'narrow' in terms of superscalar execution (2-way superscalar, I believe). So the main advantage of HT there was that one thread could schedule work while another was waiting on memory access. If both were putting demands on the core execution resources there was not much to gain unless one thread stalled on memory access a lot, and if one of them was spinning it would eat up most of the shared resources. In both cases, the main execution resources get split up. L1 cache, instruction buffers and decoders, instruction reorder buffers, etc. But in this release, Intel increased several of these to beyond what is optimal for one thread, to make the HT more efficient. But the type of applications that will benefit the most from this HT is not always the same as the older one, since the two CPU lines have different weaknesses for SMT to mask or strengths to enhance. > I'm guessing it's the better interconnect, but work interferes with > finding the time to research and benchmark. The new memory and interconnect architecture has a huge impact on performance, but it is separate from the other big features (Turbo being the other one not discussed here). For scalability to many CPUs it is probably the most significant however. Note, that these CPU's have some good power saving technology that helps quite a bit when idle or using just one core or thread, but when all threads are ramped up and all the memory banks are filled the systems draw a LOT of power. AMD still does quite well if you're on a power budget with their latest CPUs. > > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance