Re: [PATCH] MIPS: lib: Optimize partial checksum ops using prefetching.

Florian Fainelli <f.fainelli@xxxxxxxxx> · Tue, 21 Jan 2014 12:25:39 -0800



2014/1/21 Steven J. Hill <Steven.Hill@xxxxxxxxxx>:
> On 01/21/2014 12:25 PM, David Daney wrote:
>>
>> On 01/21/2014 08:18 AM, Steven J. Hill wrote:
>>>
>>> From: Leonid Yegoshin <Leonid.Yegoshin@xxxxxxxxxx>
>>>
>>> Use the PREF instruction to optimize partial checksum operations.
>>>
>>> Signed-off-by: Leonid Yegoshin <Leonid.Yegoshin@xxxxxxxxxx>
>>> Signed-off-by: Steven J. Hill <Steven.Hill@xxxxxxxxxx>
>>
>>
>> NACK.  The proper latench and cacheline stride vary by CPU, you cannot
>> just hard code them for 32-byte cacheline size with some random latency.
>>
>> This will make some CPUs slower.
>>
> Note that memcpy.S already uses fixed cache lines (32 bytes) so this is
> merely doing the same thing. I assume you have some empirical evidence
> concerning other CPUs being slower?

How about using cpu_dcache_line_size()/MIPS_L1_CACHE_SHIFT? These
should provide a good hint. Octeon has a 128bytes D$ line size, so
prefetching via slices of 32 bytes is most likely suboptimal.
-- 
Florian