Re: Optimizing mmap_queue on AVX/AVX2 CPUs

Sitsofe Wheeler <sitsofe@xxxxxxxxx> · Wed, 6 Sep 2017 21:20:31 +0100

On 6 September 2017 at 19:31, Rebecca Cran <rebecca@xxxxxxxxxxxx> wrote:
> On 8/30/2017 2:57 PM, Elliott, Robert (Persistent Memory) wrote:
>>
>> There's even a new patch set to use the Intel QuickData DMA engines
>> for transfers rather than the CPU (a "blkmq" pmem driver).  It'd be
>> interesting if fio could use that hardware too (with direct access by
>> fio, not resorting to kernel read()/write() calls).
>
>
> I build the example performance tester program from Intel that compares
> memcpy with QuickData for various buffer and block sizes, and the best
> result was QuickData being the same speed as memcpy; otherwise, QuickData
> was between a tenth and half the speed.
> Given that, I'm planning to focus on just adding SSE (not sure about this
> one yet, since all x86_64 systems support it, so memcpy should be using it
> already), AVX, AVX-512 and A64 Advanced SIMD (for ARM64) to FIO.

Does that mean your assembly copy is better than memcpy on generic
data going memory-memory or is is it just in relation to copying to
block devices?

-- 
Sitsofe | http://sucs.org/~sits/
--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html