2014/1/21 Steven J. Hill <Steven.Hill@xxxxxxxxxx>: > On 01/21/2014 12:25 PM, David Daney wrote: >> >> On 01/21/2014 08:18 AM, Steven J. Hill wrote: >>> >>> From: Leonid Yegoshin <Leonid.Yegoshin@xxxxxxxxxx> >>> >>> Use the PREF instruction to optimize partial checksum operations. >>> >>> Signed-off-by: Leonid Yegoshin <Leonid.Yegoshin@xxxxxxxxxx> >>> Signed-off-by: Steven J. Hill <Steven.Hill@xxxxxxxxxx> >> >> >> NACK. The proper latench and cacheline stride vary by CPU, you cannot >> just hard code them for 32-byte cacheline size with some random latency. >> >> This will make some CPUs slower. >> > Note that memcpy.S already uses fixed cache lines (32 bytes) so this is > merely doing the same thing. I assume you have some empirical evidence > concerning other CPUs being slower? How about using cpu_dcache_line_size()/MIPS_L1_CACHE_SHIFT? These should provide a good hint. Octeon has a 128bytes D$ line size, so prefetching via slices of 32 bytes is most likely suboptimal. -- Florian