Re: [PATCH, v2] MIPS: lib: csum_partial: more instruction paral

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2014-05-19 14:59 GMT+08:00 James Hogan <james.hogan@xxxxxxxxxx>:
> On Monday 19 May 2014 11:14:07 chenj wrote:
>> Computing sum introduces true data dependency, e.g.
>>       ADDC(sum, t0)
>>       ADDC(sum, t1)
>>       ADDC(sum, t2)
>>       ADDC(sum, t3)
>> Here, each ADDC(sum, ...) references the sum value updated by previous ADDC.
>>
>> In this patch, above sequence is adjusted as following:
>>       ADDC(t0, t1)
>>       ADDC(t2, t3)
>>       ADDC(sum, t0)
>>       ADDC(sum, t2)
>> The first two ADDC operations are independent, hence can be executed
>> simultaneously if possible.
>
> The actual patch appears to change it to this:
> ADDC(t0, t1)
> ADDC(sum, t0)
> ADDC(t2, t3)
> ADDC(sum, t2)
>
> which is slightly different (presumably due to the interleaved stores in some
> of the cases).
>
>> This patch improves instruction level parallelism, and brings at most 50%
>> csum performance gain on Loongson 3a processor[1].
>
> Nice results.
>
> The stuff below the --- will get dropped when the patch is applied though,
> after which the "[1]" won't refer to anything.
>
Thanks for your suggestion, I'll amend the commit message further later.

Basically, the patch reduces the case of one ADDC depending on the
result of the previous ADDC.

BTW, I'm not sure whether the sum value of the new implementation is
equivalent to the original one, but in my test(make run_test of the
csum_test.tar.gz, and a comparing patch in kernel) it is.


[Index of Archives]     [Linux MIPS Home]     [LKML Archive]     [Linux ARM Kernel]     [Linux ARM]     [Linux]     [Git]     [Yosemite News]     [Linux SCSI]     [Linux Hams]

  Powered by Linux