Re: [PATCH, v2] MIPS: lib: csum_partial: more instruction paral

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2014-05-19 23:32 GMT+08:00 Chen Jie <chenj@xxxxxxxxxx>:
> 2014-05-19 14:59 GMT+08:00 James Hogan <james.hogan@xxxxxxxxxx>:
>> On Monday 19 May 2014 11:14:07 chenj wrote:
>>> Computing sum introduces true data dependency, e.g.
>>>       ADDC(sum, t0)
>>>       ADDC(sum, t1)
>>>       ADDC(sum, t2)
>>>       ADDC(sum, t3)
>>> Here, each ADDC(sum, ...) references the sum value updated by previous ADDC.
>>>
>>> In this patch, above sequence is adjusted as following:
>>>       ADDC(t0, t1)
>>>       ADDC(t2, t3)
>>>       ADDC(sum, t0)
>>>       ADDC(sum, t2)
>>> The first two ADDC operations are independent, hence can be executed
>>> simultaneously if possible.
>>
>> The actual patch appears to change it to this:
>> ADDC(t0, t1)
>> ADDC(sum, t0)
>> ADDC(t2, t3)
>> ADDC(sum, t2)
>>
>> which is slightly different (presumably due to the interleaved stores in some
>> of the cases).
>>
>>> This patch improves instruction level parallelism, and brings at most 50%
>>> csum performance gain on Loongson 3a processor[1].
>>
>> Nice results.
>>
>> The stuff below the --- will get dropped when the patch is applied though,
>> after which the "[1]" won't refer to anything.
>>
> Thanks for your suggestion, I'll amend the commit message further later.
>
> Basically, the patch reduces the case of one ADDC depending on the
> result of the previous ADDC.
>
> BTW, I'm not sure whether the sum value of the new implementation is
> equivalent to the original one, but in my test(make run_test of the
> csum_test.tar.gz, and a comparing patch in kernel) it is.

It is equivalent to the original one.
More explanation about the math behind "x ADDC y ADDC z == x ADDC (y ADDC z)":

Let C = the value of '(uint64_t) -1 + 1' in 64bit case, then
0 <= x <= C-1
0 <= y <= C-1
0 <= z <= C-1

Here 'x ADDC y' is defined as
x + y >= C ? x + y - C + 1 : x + y


Case 1: x + y >= C && x + y - C + 1 + z < C (i.e. x ADDC y ADDC z = x
+ y - C +1 + z)
if y + z >= C:
=> y ADDC z = y + z - C + 1
C > x + y + z - C + 1
=> The result is x + y + z - C + 1

if y + z < C:
=> y ADDC z = y + z
x + y >= C
=> x + (y + z) >= C + z >= C
=> The result is x + y + z - C + 1

Case 2: x + y < C && x + y + z >= C (i.e. x ADDC y ADDC z = x + y + z - C + 1)
if y + z >= C:
=> y ADDC z = y + z - C + 1
C > x + y
=> C + z - C + 1 > x + y + z - C + 1
=> z + 1 > x + y + z - C + 1
=> C >= z + 1 > x + y + z - C + 1
=> The result is x + y + z - C + 1

if y + z < C:
=> y ADDC z = y + z
x + y + z >= C
=> The result is x + y + z - C + 1

Case 3: x + y >= C && x + y - C + 1 + z >= C (i.e. x ADDC y ADDC z = x
+ y + z - 2C + 2)
x + y - C + 1 + z >= C
=> y + z >= 2C - 1 - x
C >= 1 + x
=> 2C - 1 - x >= C
=> y + z >= C
=> y ADDC z = y + z - C + 1

x + y - C + 1 + z >= C
=> The result is  x + y + z - 2C + 2


Case 4: x + y < C && x + y + z < C (i.e. x ADDC y ADDC z = x + y + z)
x + y + z < C
=> y + z < C - x <= C
=> y + z < C
=> y ADDC z = y + z

x + y + z < C
=> The result is x + y + z




[Index of Archives]     [Linux MIPS Home]     [LKML Archive]     [Linux ARM Kernel]     [Linux ARM]     [Linux]     [Git]     [Yosemite News]     [Linux SCSI]     [Linux Hams]

  Powered by Linux