Re: Hyper Threaded Pentium 4 - a bit long

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Pete, thanks a lot! I'm learning.

Bob

Pete Huckelba wrote:

> At 07:35 PM 10/14/2002, you wrote:
>
>> I'm new to the CPU-level stuff and don't have much background in 
>> them. So some of the terms are new for me. Let me get really basic: 
>> why would you want multiple processors on a motherboard as opposed to 
>> a single fast processor?
>
>
> More is always better, unless you start talking about a drowning man 
> and cups of water. As for wanting multiple processors instead of a 
> single fast one, you of course would always want to have the fastest 
> processor(s) in the greatest numbers. In some multi-user environments 
> jobs are numerous, but small and are often run by many users 
> simultaneously. In this case, a multi-processor machine would be 
> preferable over a single processor machine since the jobs can be run 
> at the same time without being queued. Jobs that take more time will 
> benefit from a faster processor. What you want to keep in mind when 
> looking at a multi-processor machine is the type of application(s) you 
> plan on running. If the application is not multi-threaded, you will 
> not benefit from the extra processor(s) unless you are running 
> multiple jobs simultaneously.
>
>> I take it that the Xeon line is for multiple CPU motherboards -- you 
>> don't just run one Xeon, am I right? What does it mean, to be 'cpu 
>> cache bound'?
>
>
> Yes, Xeon's were designed for MP. From what I understand, you can run 
> a single Xeon, which in effect is a just a P4. You can of course read 
> more and have a visual comparison at: 
> http://intel.com/support/processors/xeon/diff.htm and 
> http://intel.com/support/processors/pentium4/p4compare.htm or google 
> it. As for being cache-bound, (my layman's understanding) just think 
> of the cache as being a super-fast storage area. Instead of having to 
> pull the data across the FSB, the data is stored on the CPU-die making 
> access time as close to real time as you can get. More info at: 
> http://www-2.cs.cmu.edu/~tcm/thesis/subsubsection2_10_1_3_2.html
>
>> Do your comments also mean the Red Hat kernel won't need testing on 
>> the new Hyper Threaded P4s?
>
>
> I have a couple of dual 2Ghz Xeons each with 2GB of PC800, one running 
> hyperthreaded, one running normal. Depending on the job, the overhead 
> associated with running hyperthreaded is enormous. Here are some stats 
> for you:
>
> [cph@blur ~]$ cat /proc/version /proc/cpuinfo /proc/meminfo
> Linux version 2.4.9-34smp (bhcompile@daffy.perf.redhat.com) (gcc 
> version 2.96 20000731 (Red Hat Linux 7.2 2.96-108.1)) #1 SMP Sat Jun 1 
> 06:15:25 EDT 2002
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 15
> model : 2
> model name : Intel(R) XEON(TM) CPU 2.00GHz
> stepping : 4
> cpu MHz : 1995.162
> cache size : 512 KB
> fdiv_bug : no
> hlt_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 2
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr mca cmov pat 
> pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
> bogomips : 3984.58
>
> processor : 1
> vendor_id : GenuineIntel
> cpu family : 15
> model : 2
> model name : Intel(R) XEON(TM) CPU 2.00GHz
> stepping : 4
> cpu MHz : 1995.162
> cache size : 512 KB
> fdiv_bug : no
> hlt_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 2
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr mca cmov pat 
> pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
> bogomips : 3984.58
>
> total: used: free: shared: buffers: cached:
> Mem: 2107416576 2077827072 29589504 0 458694656 1494269952
> Swap: 1077501952 0 1077501952
> MemTotal: 2058024 kB
> MemFree: 28896 kB
> MemShared: 0 kB
> Buffers: 447944 kB
> Cached: 1459248 kB
> SwapCached: 0 kB
> Active: 1211376 kB
> Inact_dirty: 454344 kB
> Inact_clean: 241472 kB
> Inact_target: 524016 kB
> HighTotal: 1178560 kB
> HighFree: 15324 kB
> LowTotal: 879464 kB
> LowFree: 13572 kB
> SwapTotal: 1052248 kB
> SwapFree: 1052248 kB
>
> and the machine with hyperthreading enabled shows:
>
> [cph@conroe ~]$ cat /proc/cpuinfo
> processor : 0
> vendor_id : GenuineIntel
> cpu family : 15
> model : 2
> model name : Intel(R) XEON(TM) CPU 2.00GHz
> stepping : 4
> cpu MHz : 1995.164
> cache size : 512 KB
> fdiv_bug : no
> hlt_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 2
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
> bogomips : 3984.58
>
> processor : 1
> vendor_id : GenuineIntel
> cpu family : 15
> model : 2
> model name : Intel(R) XEON(TM) CPU 2.00GHz
> stepping : 4
> cpu MHz : 1995.164
> cache size : 512 KB
> fdiv_bug : no
> hlt_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 2
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
> bogomips : 3984.58
>
> processor : 2
> vendor_id : GenuineIntel
> cpu family : 15
> model : 2
> model name : Intel(R) XEON(TM) CPU 2.00GHz
> stepping : 4
> cpu MHz : 1995.164
> cache size : 512 KB
> fdiv_bug : no
> hlt_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 2
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
> bogomips : 3984.58
>
> processor : 3
> vendor_id : GenuineIntel
> cpu family : 15
> model : 2
> model name : Intel(R) XEON(TM) CPU 2.00GHz
> stepping : 4
> cpu MHz : 1995.164
> cache size : 512 KB
> fdiv_bug : no
> hlt_bug : no
> f00f_bug : no
> coma_bug : no
> fpu : yes
> fpu_exception : yes
> cpuid level : 2
> wp : yes
> flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
> bogomips : 3984.58
>
> [cph@conroe ~]$ cat /proc/meminfo
> total: used: free: shared: buffers: cached:
> Mem: 2113667072 2008354816 105312256 0 212000768 1683333120
> Swap: 2146754560 0 2146754560
> MemTotal: 2064128 kB
> MemFree: 102844 kB
> MemShared: 0 kB
> Buffers: 207032 kB
> Cached: 1643880 kB
> SwapCached: 0 kB
> Active: 948732 kB
> Inact_dirty: 848948 kB
> Inact_clean: 62824 kB
> Inact_target: 372100 kB
> HighTotal: 1178560 kB
> HighFree: 19404 kB
> LowTotal: 885568 kB
> LowFree: 83440 kB
> SwapTotal: 2096440 kB
> SwapFree: 2096440 kB
> Committed_AS: 7888 kB
>
>
> As a speed test, I ran some test certifications of our statistical 
> software, the machine running in hyperthreaded mode was significantly 
> slower than the dual Xeon running in "native" mode. We have a 
> certification script that I put in a batch, kicking off two on the 
> dual Xeon and four on the dual Xeon running hyperthreaded.
>
> non-hyper:
>
> real 33m38.261s
> user 31m38.750s
> sys 1m48.790s
>
> real 33m40.230s
> user 31m40.630s
> sys 1m48.660s
>
> hyperthread enabled:
>
> real 58m31.635s
> user 56m5.110s
> sys 2m5.660s
>
> real 58m43.463s
> user 56m10.390s
> sys 2m8.450s
>
> real 58m56.267s
> user 56m20.470s
> sys 2m10.940s
>
> real 58m59.632s
> user 56m27.340s
> sys 2m6.860s
>
> So, while it could be argued that the hyperthreaded machine suffered a 
> bit from being I/O bound on the harddrive, that period of time was 
> negligible versus being cache-bound as stated by Mr. Flory below. Also 
> to note there is some overhead which I have not investigated on how 
> the kernel handles hyperthreading. A top shows:
>
> PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
> 1 root 15 0 404 404 356 S 0.0 0.0 0:10 init
> 2 root 0K 0 0 0 0 SW 0.0 0.0 0:00 migration_CPU0
> 3 root 0K 0 0 0 0 SW 0.0 0.0 0:00 migration_CPU1
> 4 root 0K 0 0 0 0 SW 0.0 0.0 0:00 migration_CPU2
> 5 root 0K 0 0 0 0 SW 0.0 0.0 0:00 migration_CPU3
> 6 root 15 0 0 0 0 SW 0.0 0.0 0:00 keventd
> 7 root 34 19 0 0 0 SWN 0.0 0.0 0:02 ksoftirqd_CPU0
> 8 root 34 19 0 0 0 SWN 0.0 0.0 0:00 ksoftirqd_CPU1
> 9 root 34 19 0 0 0 SWN 0.0 0.0 0:00 ksoftirqd_CPU2
> 10 root 34 19 0 0 0 SWN 0.0 0.0 0:00 ksoftirqd_CPU3
>
> Where "migration_CPUX" is not seen on the non-hyperthreaded version. 
> What is interesting to notice is that the migration process never 
> consumes time, memory or CPU. hmmmmmmmm.
>
> Pete
>
>
>> Thanks
>>
>> Bob Cochran
>>
>> Samuel Flory wrote:
>>
>>> Red Hat has support this since one of the 7.2 kernel updates. This 
>>> is old hat on the current crop of Xeon (aka P4 Xeon). Linux treats 
>>> them as multiple cpus. Don't assume that this will make your system 
>>> faster. If you tend to only one process active at a time then it 
>>> will slow things down. It's also really bad if you are cpu cache bound.
>>>
>>>
>>>
>>
>>
>>
>> --
>> Psyche-list mailing list
>> Psyche-list@redhat.com
>> https://listman.redhat.com/mailman/listinfo/psyche-list
>
>
>
> --------------------------
> Pete Huckelba
>
> Stata Corporation
> 4905 Lakeway Drive
> College Station, TX 77845
> (979)696-4600
>
>
>



-- 
Psyche-list mailing list
Psyche-list@redhat.com
https://listman.redhat.com/mailman/listinfo/psyche-list

[Index of Archives]     [Fedora General Discussion]     [Red Hat General Discussion]     [Centos]     [Kernel]     [Red Hat Install]     [Red Hat Watch]     [Red Hat Development]     [Red Hat 9]     [Gimp]     [Yosemite News]

  Powered by Linux