On 01/21/2014 12:25 PM, David Daney wrote:
On 01/21/2014 08:18 AM, Steven J. Hill wrote:
From: Leonid Yegoshin <Leonid.Yegoshin@xxxxxxxxxx>
Use the PREF instruction to optimize partial checksum operations.
Signed-off-by: Leonid Yegoshin <Leonid.Yegoshin@xxxxxxxxxx>
Signed-off-by: Steven J. Hill <Steven.Hill@xxxxxxxxxx>
NACK. The proper latench and cacheline stride vary by CPU, you cannot
just hard code them for 32-byte cacheline size with some random latency.
This will make some CPUs slower.
Note that memcpy.S already uses fixed cache lines (32 bytes) so this is
merely doing the same thing. I assume you have some empirical evidence
concerning other CPUs being slower?