On Thu, 6 Aug 2009, Artur Skawina wrote: > > > > The way it's written, I can easily make it do one or the other by just > > turning the macro inside a loop (and we can have a preprocessor flag to > > choose one or the other), but let me work on it a bit more first. > > that's of course how i measured it.. :) Well, with my "rolling 512-bit array" I can't do that easily any more. Now it actually depends on the compiler being able to statically do that circular list calculation. If I were to turn it back into the chunks of loops, my new code would suck, because it would have all those nasty dynamic address calculations. > I've only tested on p4 and there the winner so far is still: Yeah, well, I refuse to touch that crappy micro-architecture any more. I complained to Intel people for years that their best CPU was only available as a laptop chip (Pentium-M), and I'm really happy to have gotten rid of all my horrid P4's. (Ok, so it was great when the P4 ran at 2x the frequency of the competition, and then it smoked them all. Except on OS loads, where the P4 exception handling took ten times longer than anything else). So I'm a big biased against P4. I'll try it on my Atom's, though. They're pretty crappy CPU's, but they have a fairly good _reason_ to be crappy. Linus -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html