Re: Hyper Threaded Pentium 4 - a bit long

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At 07:35 PM 10/14/2002, you wrote:
>I'm new to the CPU-level stuff and don't have much background in them. So 
>some of the terms are new for me. Let me get really basic: why would you 
>want multiple processors on a motherboard as opposed to a single fast 
>processor?

More is always better, unless you start talking about a drowning man and 
cups of water.  As for wanting multiple processors instead of a single fast 
one, you of course would always want to have the fastest processor(s) in 
the greatest numbers.  In some multi-user environments jobs are numerous, 
but small and are often run by many users simultaneously.  In this case, a 
multi-processor machine would be preferable over a single processor machine 
since the jobs can be run at the same time without being queued.  Jobs that 
take more time will benefit from a faster processor.  What you want to keep 
in mind when looking at a multi-processor machine is the type of 
application(s) you plan on running.  If the application is not 
multi-threaded, you will not benefit from the extra processor(s) unless you 
are running multiple jobs simultaneously.

>I take it that the Xeon line is for multiple CPU motherboards -- you don't 
>just run one Xeon, am I right? What does it mean, to be 'cpu cache bound'?

Yes, Xeon's were designed for MP.  From what I understand, you can run a 
single Xeon, which in effect is a just a P4.  You can of course read more 
and have a visual comparison at: 
http://intel.com/support/processors/xeon/diff.htm and 
http://intel.com/support/processors/pentium4/p4compare.htm or google 
it.   As for being cache-bound, (my layman's understanding) just think of 
the cache as being a super-fast storage area.  Instead of having to pull 
the data across the FSB, the data is stored on the CPU-die making access 
time as close to real time as you can get.  More info 
at:  http://www-2.cs.cmu.edu/~tcm/thesis/subsubsection2_10_1_3_2.html

>Do your comments also mean the Red Hat kernel won't need testing on the 
>new Hyper Threaded P4s?

I have a couple of dual 2Ghz Xeons each with 2GB of PC800, one running 
hyperthreaded, one running normal.  Depending on the job, the overhead 
associated with running hyperthreaded is enormous.  Here are some stats for 
you:

[cph@blur ~]$ cat /proc/version /proc/cpuinfo /proc/meminfo
Linux version 2.4.9-34smp (bhcompile@daffy.perf.redhat.com) (gcc version 
2.96 20000731 (Red Hat Linux 7.2 2.96-108.1)) #1 SMP Sat Jun 1 06:15:25 EDT 
2002
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) XEON(TM) CPU 2.00GHz
stepping        : 4
cpu MHz         : 1995.162
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips        : 3984.58

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) XEON(TM) CPU 2.00GHz
stepping        : 4
cpu MHz         : 1995.162
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips        : 3984.58

         total:    used:    free:  shared: buffers:  cached:
Mem:  2107416576 2077827072 29589504        0 458694656 1494269952
Swap: 1077501952        0 1077501952
MemTotal:      2058024 kB
MemFree:         28896 kB
MemShared:           0 kB
Buffers:        447944 kB
Cached:        1459248 kB
SwapCached:          0 kB
Active:        1211376 kB
Inact_dirty:    454344 kB
Inact_clean:    241472 kB
Inact_target:   524016 kB
HighTotal:     1178560 kB
HighFree:        15324 kB
LowTotal:       879464 kB
LowFree:         13572 kB
SwapTotal:     1052248 kB
SwapFree:      1052248 kB

and the machine with hyperthreading enabled shows:

[cph@conroe ~]$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) XEON(TM) CPU 2.00GHz
stepping        : 4
cpu MHz         : 1995.164
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips        : 3984.58

processor       : 1
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) XEON(TM) CPU 2.00GHz
stepping        : 4
cpu MHz         : 1995.164
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips        : 3984.58

processor       : 2
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) XEON(TM) CPU 2.00GHz
stepping        : 4
cpu MHz         : 1995.164
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips        : 3984.58

processor       : 3
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) XEON(TM) CPU 2.00GHz
stepping        : 4
cpu MHz         : 1995.164
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips        : 3984.58

[cph@conroe ~]$ cat /proc/meminfo
         total:    used:    free:  shared: buffers:  cached:
Mem:  2113667072 2008354816 105312256        0 212000768 1683333120
Swap: 2146754560        0 2146754560
MemTotal:      2064128 kB
MemFree:        102844 kB
MemShared:           0 kB
Buffers:        207032 kB
Cached:        1643880 kB
SwapCached:          0 kB
Active:         948732 kB
Inact_dirty:    848948 kB
Inact_clean:     62824 kB
Inact_target:   372100 kB
HighTotal:     1178560 kB
HighFree:        19404 kB
LowTotal:       885568 kB
LowFree:         83440 kB
SwapTotal:     2096440 kB
SwapFree:      2096440 kB
Committed_AS:     7888 kB


As a speed test, I ran some test certifications of our statistical 
software, the machine running in hyperthreaded mode was significantly 
slower than the dual Xeon running in "native" mode.  We have a 
certification script that I put in a batch, kicking off two on the dual 
Xeon and four on the dual Xeon running hyperthreaded.

non-hyper:

real    33m38.261s
user    31m38.750s
sys     1m48.790s

real    33m40.230s
user    31m40.630s
sys     1m48.660s

hyperthread enabled:

real    58m31.635s
user    56m5.110s
sys     2m5.660s

real    58m43.463s
user    56m10.390s
sys     2m8.450s

real    58m56.267s
user    56m20.470s
sys     2m10.940s

real    58m59.632s
user    56m27.340s
sys     2m6.860s

So, while it could be argued that the hyperthreaded machine suffered a bit 
from being I/O bound on the harddrive, that period of time was negligible 
versus being cache-bound as stated by Mr. Flory below.  Also to note there 
is some overhead which I have not investigated on how the kernel handles 
hyperthreading.  A top shows:

   PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
     1 root      15   0   404  404   356 S     0.0  0.0   0:10 init
     2 root      0K   0     0    0     0 SW    0.0  0.0   0:00 migration_CPU0
     3 root      0K   0     0    0     0 SW    0.0  0.0   0:00 migration_CPU1
     4 root      0K   0     0    0     0 SW    0.0  0.0   0:00 migration_CPU2
     5 root      0K   0     0    0     0 SW    0.0  0.0   0:00 migration_CPU3
     6 root      15   0     0    0     0 SW    0.0  0.0   0:00 keventd
     7 root      34  19     0    0     0 SWN   0.0  0.0   0:02 ksoftirqd_CPU0
     8 root      34  19     0    0     0 SWN   0.0  0.0   0:00 ksoftirqd_CPU1
     9 root      34  19     0    0     0 SWN   0.0  0.0   0:00 ksoftirqd_CPU2
    10 root      34  19     0    0     0 SWN   0.0  0.0   0:00 ksoftirqd_CPU3

Where "migration_CPUX" is not seen on the non-hyperthreaded version.  What 
is interesting to notice is that the migration process never consumes time, 
memory or CPU.  hmmmmmmmm.

Pete


>Thanks
>
>Bob Cochran
>
>Samuel Flory wrote:
>
>>Red Hat has support this since one of the 7.2 kernel updates. This is old 
>>hat on the current crop of Xeon (aka P4 Xeon). Linux treats them as 
>>multiple cpus. Don't assume that this will make your system faster. If 
>>you tend to only one process active at a time then it will slow things 
>>down. It's also really bad if you are cpu cache bound.
>>
>>
>>
>
>
>
>--
>Psyche-list mailing list
>Psyche-list@redhat.com
>https://listman.redhat.com/mailman/listinfo/psyche-list


--------------------------
Pete Huckelba

Stata Corporation
4905 Lakeway Drive
College Station, TX  77845
(979)696-4600



-- 
Psyche-list mailing list
Psyche-list@redhat.com
https://listman.redhat.com/mailman/listinfo/psyche-list

[Index of Archives]     [Fedora General Discussion]     [Red Hat General Discussion]     [Centos]     [Kernel]     [Red Hat Install]     [Red Hat Watch]     [Red Hat Development]     [Red Hat 9]     [Gimp]     [Yosemite News]

  Powered by Linux