RE: memcpy test

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: fio-owner@xxxxxxxxxxxxxxx [mailto:fio-owner@xxxxxxxxxxxxxxx] On
> Behalf Of Jens Axboe
> Sent: Friday, December 1, 2017 12:20 PM
> To: fio@xxxxxxxxxxxxxxx
> Cc: Rebecca Cran <rebecca@xxxxxxxxxxxx>; Sitsofe Wheeler
> <sitsofe@xxxxxxxxx>; Robert Elliott (Persistent Memory) <elliott@xxxxxxx>
> Subject: memcpy test
> 
> Hi,
> 
> Reviving this topic, since I think it's interesting in the presence
> of persistent memory engines that rely heavily on optimized memcpy
> to be fast.
> 
> Similar to how we have --crctest, I added --memcpytest. Very basic,
> just wanted to get the ball rolling. Basically it just copies between
> two 32MB chunks, using whatever implementation you would like, and in
> increments of some defined size. This is what it spits out on my
> laptop:
> 
> memcpy
> 	8 bytes:	 3360.94 MiB/sec
> 	16 bytes:	 4363.47 MiB/sec
> 	96 bytes:	 6804.46 MiB/sec
> 	128 bytes:	 6391.39 MiB/sec
> 	256 bytes:	 6571.09 MiB/sec
> 	512 bytes:	 6962.77 MiB/sec
> 	2048 bytes:	 6212.73 MiB/sec
> 	8192 bytes:	 6465.14 MiB/sec
> 	131072 bytes:	 6412.24 MiB/sec
> 	262144 bytes:	 6607.03 MiB/sec
> 	524288 bytes:	 6372.90 MiB/sec
> memmove
> 	8 bytes:	 2503.90 MiB/sec
> 	16 bytes:	 4311.81 MiB/sec
> 	96 bytes:	 6734.74 MiB/sec
> 	128 bytes:	 6080.16 MiB/sec
> 	256 bytes:	 6162.92 MiB/sec
> 	512 bytes:	 7309.80 MiB/sec
> 	2048 bytes:	 6931.94 MiB/sec
> 	8192 bytes:	 6878.97 MiB/sec
> 	131072 bytes:	 6787.05 MiB/sec
> 	262144 bytes:	 6877.77 MiB/sec
> 	524288 bytes:	 6695.26 MiB/sec
> simple
> 	8 bytes:	 1813.59 MiB/sec
> 	16 bytes:	 2191.63 MiB/sec
> 	96 bytes:	 7360.76 MiB/sec
> 	128 bytes:	 7192.63 MiB/sec
> 	256 bytes:	 7340.00 MiB/sec
> 	512 bytes:	 7158.04 MiB/sec
> 	2048 bytes:	 7495.96 MiB/sec
> 	8192 bytes:	 7315.30 MiB/sec
> 	131072 bytes:	 7565.82 MiB/sec
> 	262144 bytes:	 7410.95 MiB/sec
> 	524288 bytes:	 7537.09 MiB/sec
> 
> which is kind of depressing, since the fastest for larger sizes is the
> very dumb and basic implementation that you'll find in any text book
> under the section of "my first memcpy".
> 
> Anyway, for evaluating implementations, we need a way to test them,
> and now we have. I'll be happy to take input/patches on the test
> itself.

Some considerations/points:
* lock down the thread to a CPU core so the kernel doesn't move it around
* ensure the memory buffer is allocated on the local node (unless intentionally
  testing remote bandwidth)
* CPU caches will distort results; it's important to flush both source and
  destination addresses out of the caches before starting, then start the timer,
  do the copy, flush the caches again, then stop the timer.
  If the copy function uses non-temporal stores, though, the second cache
  flush is not needed and would unfairly penalize it.
* one CPU will be limited to about 10 GB/s for various interesting reasons;
  you need multiple CPUs active to saturate memory channels
* integrating Agner Fog's assembly language memory function library might
  be a good option, if fio can take GPLv3 code.  That way fio would show
  what the processors are capable of achieving, for comparison to what
  the installed system libraries do. See http://www.agner.org/optimize -
  section 17.9 of "Optimizing assembly"  discusses the memcpy functions.


��.n��������+%������w��{.n�������^n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�

[Index of Archives]     [Linux Kernel]     [Linux SCSI]     [Linux IDE]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux