JD wrote: > Correct James. The clobbering of the cache by 2 different threads > does not depend on whether or not the cpu is hyperthreaded. > Any two threads can achieve this clobering on any cpu, and it is > often the case. This last sentence is true, but with normal multitasking, and no multi-threading, each software thread gets a slice of the processor time to itself â usually several million clock cycles, these daysÂ. So the thread has a chance to fill the level 1 cache with its own data before another thread gets a look in. With multi-threading, each thread is *constantly* clobbering the otherâs data. > The only situation where hyperthreading will show noticeable > improvement of execution speed is where the threads are all > children of the same process and are well behaved and work > almost entirely on the parent process' data space, with proper > synchronization. However, if the parent data space and text > space is larger than the cache, then the sibling threads can > still cause cache refill every time a sibling accesses a different > data space than other siblings. Ditto with the instruction cache. > Different threads have a different set of instructions. This does not appear to match reality for all processors. The Pentium 4 was both the first generally-available processor with multi-threading available, and a pretty poor example of multi-threading. So a lot of people got a poor first impression. Even there, there were other cases when multi-threading made a lot of sense: if, for example, the algorithm was such that youâre going to get mostly cache misses *anyway*, then you might as well have two threads hanging around waiting for data as one. Other processors (current Core i7 and i5, for example) tend not to have such a microscopic Level 1 cache, so thereâs more chance for both working sets to fit in cache at the same time. http://www.realworldtech.com/beta/forums/index.cfm?action=detail&id=89001&threadid=89001&roomid=2 (and following thread) gives a link to an Intel benchmark claiming a 50%+ performance improvement due to hyperthreading on Atom. Linus Torvalds effectively says âitâs easy to get 50% performance improvements if the CPU canât make good use of all itâs resources with just one thread.â Iâd note, too, that Bulldozerâs FPU is effectively multi-threading, and that doesnât use Level 1 data cache *at all*: the data all comes from Level 2. AMD apparently believes they can get enough out-of-order re-ordering to hide the latency. > My basic attitude is forget hyperthreading. IMHO it is largely > a hype! You know, Iâd actually agree with that on the desktopâ â but for different reasons. The number of hardware threads has mushroomed over the last ten years, but desktop software is still largely single-threaded. Itâs still fairly rare for there to be a situation where desktop software can make efficient use of six or eight threads. The main exceptions are things like transcoding and compression â and few people buy desktops to do that â and compiling large software projects, like the Linux kernel. Personally, I prefer to let the Fedora Project do most of that for me! Hope this helps, James.  IF the thread needs it.  You donât need the entire program in cache, just the bits that the program is currently using.  As far as we can tell, yes, *that* Linus. He certainly has the same use of language, the same arguing style, and knows stuff the real Linus would. â Servers often do have enough software threads to make use of all the hardware threads they can get â see Sunâs Niagara for an example. And single-core Atoms benefit from hyperthreading to improve latency. -- E-mail: james@ | âMy auntâs camel has fallen in the mirage.â aprilcottage.co.uk | -- âSoul Musicâ, Terry Pratchett. -- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines