On Thu, Nov 18, 2010 at 04:23:56PM +0100, Jakub Jelinek wrote: > It is very sad that Intel/AMD just didn't make sure rep movsb > isn't the fastest copying sequence on all of their CPUs, > which underneath could do whatever magic based on size and src/dst > alignment (e.g. for small length handle it in hw so it is as quick as > possible, for larger sizes perhaps handle it in microcode) - rep movsb > can be easily inlined and is quite short as well. But on many, especially > recent, CPUs it performs very badly compared to these much larger SSE* optimized > routines. > > If you want exact numbers, best ask Intel folks who wrote and tuned the > SSE4.2 memcpy routine. I wonder if the Intel people who benchmarked memcpy throughput also benchmarked the increased context switch time that will happen now that the kernels lazy-fpu state saving is effectively disabled every time something calls memcpy. Dave -- devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/devel