2014-05-25 14:02 GMT+08:00 chenj <chenj@xxxxxxxxxx>: > Computing sum introduces true data dependency. This patch removes some > true data depdendencies, hence instruction level parallelism is > improved. > > This patch brings at most 50% csum performance gain on Loongson 3a > processor in our test. > > One example about how this patch works is in CSUM_BIGCHUNK1: > // ** original ** vs ** patch applied ** > ADDC(sum, t0) ADDC(t0, t1) > ADDC(sum, t1) ADDC(t2, t3) > ADDC(sum, t2) ADDC(sum, t0) > ADDC(sum, t3) ADDC(sum, t2) Following is the math behind the above transformation, i.e. math proof of equalization of the left and right. For convenience of expression: 1. Mark "ADDC(A, B)" as "A 烫 B", then A 烫 B = (A + B >= 2^64) ? A + B - 2^64 + 1 : A+B 2. Mark 2^64 as @ What we prove is: sum 烫 A 烫 B == sum 烫 (A 烫 B) There are two add operations in "sum 烫 A 烫 B". According to whether each add op overflows, there are four cases: Case 1: The first add op overflows, the second **not** Case 2: The first add op does **not** overflow, the second does Case 3: Both the first and the second add op overflow Case 4: **Neither** the first nor the second add op overflow ==== Case 1: The first add op overflows, the second **not**, that means: 1 sum + A >= @ 2 sum + A - @ + 1 + B < @ sum 烫 A 烫 B = sum + A + B - @ + 1 If A + B >= @: => A 烫 B = A + B - @ + 1 Q: sum + (A 烫 B) = sum + A + B - @ + 1 OVERFLOWS?(i.e. sum + A + B - @ + 1 >= @) sum 烫 (A 烫 B) = sum + A + B - @ + 1 # see (2) If A + B < @: => A 烫 B = A + B Q: sum + (A 烫 B) = sum + A + B OVERFLOWS? sum + A + B >= @ + B >= @ # adds B to both sides of (1) => sum 烫 (A 烫 B) = sum + A + B - @ + 1 ==== Case 2: The first add op does **not** overflow, the second does, that means: 1 sum + A < @ 2 sum + A + B >= @ sum 烫 A 烫 B = sum + A + B - @ + 1 If A + B >= @: => A 烫 B = A + B - @ + 1 Q: sum + (A 烫 B) = sum + A + B - @ + 1 OVERFLOWS? sum + A + B - @ + 1 < @ + B - @ + 1 # adds 'B-@+1' to both sides of (1) => sum + A + B - @ + 1 < B + 1 <= @ => sum + A + B - @ + 1 < @ => sum 烫 (A 烫 B) = sum + A + B - @ + 1 If A + B < @: => A 烫 B = A + B Q: sum + (A 烫 B) = sum + A + B OVERFLOWS? sum 烫 (A 烫 B) = sum + A + B - @ + 1 # see (2) ==== Case 3: Both the first and the second add op overflow, that means: 1 sum + A >= @ 2 sum + A + B - @ + 1 >= @ sum 烫 A 烫 B = sum + A + B - 2@ + 2 3 A + B >= 2@ - 1 - sum # transformed from (2) 1 + sum <= @ => @ + 1 + sum <= 2@ => @ <= 2@ - 1 - sum => @ <= 2@ - 1 - sum <= A + B # see (3) => A + B >= @ => A 烫 B = A + B - @ + 1 Q: sum + (A 烫 B) = sum + A + B - @ + 1 OVERFLOWS? sum 烫 (A 烫 B) = sum + A + B - 2@ + 2 # see (2) ==== Case 4: Neither the first nor the second add op overflow, that means: 1 sum + A < @ 2 sum + A + B < @ sum 烫 A 烫 B = sum + A + B A + B < @ - sum <= @ # transformed from (2) => A + B < @ => A 烫 B = A + B Q: sum + (A 烫 B) = sum + A + B OVERFLOWS? sum 烫 (A 烫 B) = sum + A + B # see (2) Conclusion: sum 烫 A 烫 B == sum 烫 (A 烫 B) -- Regards, - cee1