On 1/18/18 10:38 AM, Rebecca Cran wrote: > I added code to lib/memcpy.c to test sse and avx performance, and found > that on modern systems memcpy outperforms both by quite some margin > (GB/s) on the larger block sizes: the only place sse/avx is an > improvement was on an older SandyBridge EP system - I've copied the > output below. > > Should I work on a patch to commit the changes, or just abandon them > since it seems the current memcpy implementation used in the mmap engine > is the best solution on modern machines? The upside would be having an implementation that is independent of the OS, the downside is the (significant) extra maintenance burden and the differing results on different machines. The synthetic test case is a bit misleading, I think. avx/sse might yield great results for small sizes, but in actual workloads, having to save/restore state across context switches will add overhead. The simple throughput test case doesn't include that. Adding the memcpy for avx/sse to the test case might be interesting though, just to be able to compare performances with builtin memcpy/memmove on a given system. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe fio" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html