Alan, You are a world famous Linux kernel developer and one of the very few able to unravel this mystery. Follows he message I sent to the users list a few days ago: **************************************************************** Hardware platform: Dell Studio XPS, Core i7 950, 8Mb L3 Cache. Operating systems: Fedora 15 x64 / Windows 7 x64 Ultimate Hi everybody, I have developed a sorting/searching library written in assembly language. As long as one stays in the L1 Cache (in place physical sorts) speeds are identical, but when the proportion of L1 cache misses is hign (sorts by reference which return an ordering vector as APL sorts do) Fedora dramatically outperforms Win7. This performance gap is stunning but consistent and I am not overdoing it. Something weird occurs when one leaves the L1 cache. As one remains in the L3 cache (my Core i7 950 has a 8Mb cache) the performance penalty is about 33%, with mixed L3 cache/main memory accesses it grows to 50%/60% ! The library is parallelized but with only one logical processor running the performance gap is identical. I wish some x64 Linux kernel developer could enlighten me. The assembly code is exactly the same in both cases(except or course for calls to APIs being replaced with Linux system calls), JWASM assembler being used. No disk swapping, large/huge pages, or virtual machine involved and my test program is a plain application run from the command line. What' s going on ? Quicksort -- users mailing list users@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe or change subscription options: https://admin.fedoraproject.org/mailman/listinfo/users Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines Have a question? Ask away: http://ask.fedoraproject.org