2014-05-19 23:32 GMT+08:00 Chen Jie <chenj@xxxxxxxxxx>: > 2014-05-19 14:59 GMT+08:00 James Hogan <james.hogan@xxxxxxxxxx>: >> On Monday 19 May 2014 11:14:07 chenj wrote: >>> Computing sum introduces true data dependency, e.g. >>> ADDC(sum, t0) >>> ADDC(sum, t1) >>> ADDC(sum, t2) >>> ADDC(sum, t3) >>> Here, each ADDC(sum, ...) references the sum value updated by previous ADDC. >>> >>> In this patch, above sequence is adjusted as following: >>> ADDC(t0, t1) >>> ADDC(t2, t3) >>> ADDC(sum, t0) >>> ADDC(sum, t2) >>> The first two ADDC operations are independent, hence can be executed >>> simultaneously if possible. >> >> The actual patch appears to change it to this: >> ADDC(t0, t1) >> ADDC(sum, t0) >> ADDC(t2, t3) >> ADDC(sum, t2) >> >> which is slightly different (presumably due to the interleaved stores in some >> of the cases). >> >>> This patch improves instruction level parallelism, and brings at most 50% >>> csum performance gain on Loongson 3a processor[1]. >> >> Nice results. >> >> The stuff below the --- will get dropped when the patch is applied though, >> after which the "[1]" won't refer to anything. >> > Thanks for your suggestion, I'll amend the commit message further later. > > Basically, the patch reduces the case of one ADDC depending on the > result of the previous ADDC. > > BTW, I'm not sure whether the sum value of the new implementation is > equivalent to the original one, but in my test(make run_test of the > csum_test.tar.gz, and a comparing patch in kernel) it is. It is equivalent to the original one. More explanation about the math behind "x ADDC y ADDC z == x ADDC (y ADDC z)": Let C = the value of '(uint64_t) -1 + 1' in 64bit case, then 0 <= x <= C-1 0 <= y <= C-1 0 <= z <= C-1 Here 'x ADDC y' is defined as x + y >= C ? x + y - C + 1 : x + y Case 1: x + y >= C && x + y - C + 1 + z < C (i.e. x ADDC y ADDC z = x + y - C +1 + z) if y + z >= C: => y ADDC z = y + z - C + 1 C > x + y + z - C + 1 => The result is x + y + z - C + 1 if y + z < C: => y ADDC z = y + z x + y >= C => x + (y + z) >= C + z >= C => The result is x + y + z - C + 1 Case 2: x + y < C && x + y + z >= C (i.e. x ADDC y ADDC z = x + y + z - C + 1) if y + z >= C: => y ADDC z = y + z - C + 1 C > x + y => C + z - C + 1 > x + y + z - C + 1 => z + 1 > x + y + z - C + 1 => C >= z + 1 > x + y + z - C + 1 => The result is x + y + z - C + 1 if y + z < C: => y ADDC z = y + z x + y + z >= C => The result is x + y + z - C + 1 Case 3: x + y >= C && x + y - C + 1 + z >= C (i.e. x ADDC y ADDC z = x + y + z - 2C + 2) x + y - C + 1 + z >= C => y + z >= 2C - 1 - x C >= 1 + x => 2C - 1 - x >= C => y + z >= C => y ADDC z = y + z - C + 1 x + y - C + 1 + z >= C => The result is x + y + z - 2C + 2 Case 4: x + y < C && x + y + z < C (i.e. x ADDC y ADDC z = x + y + z) x + y + z < C => y + z < C - x <= C => y + z < C => y ADDC z = y + z x + y + z < C => The result is x + y + z