Re: Optimizing mmap_queue on AVX/AVX2 CPUs

Sitsofe Wheeler <sitsofe@xxxxxxxxx> · Thu, 7 Sep 2017 07:00:55 +0100

On 6 September 2017 at 21:54, Rebecca Cran <rebecca@xxxxxxxxxxxx> wrote:
>
>> On Sep 6, 2017, at 2:20 PM, Sitsofe Wheeler <sitsofe@xxxxxxxxx> wrote:
>
>> Does that mean your assembly copy is better than memcpy on generic
>> data going memory-memory or is is it just in relation to copying to
>> block devices?
>
> I'm testing memory-based filesystems (mounted with DAX) using the mmap ioengine - either against an NVDIMM-N DDR4 module or on FreeBSD against an md device.
>
> Both my code using assembly intrinsincs and standard loops optimized with -ftree-vectorize are better than generic memcpy.

When this gets added will it be possible for fio to have a "memcpy
benchmark" mode where you're able to compare implementations when
using a fixed block size (in a similar way to --crctest) or does this
not make sense because you actually have to be copying to a device to
see the difference?

-- 
Sitsofe | http://sucs.org/~sits/
--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html