Re: [v5] MIPS: lib: csum_partial: more instruction paral

cee1 <fykcee1@xxxxxxxxx> · Tue, 31 Mar 2015 16:34:39 +0800

2015-03-31 4:10 GMT+08:00 Ralf Baechle <ralf@xxxxxxxxxxxxxx>:
>> One example about how this patch works is in CSUM_BIGCHUNK1:
>> // ** original **    vs    ** patch applied **
>>     ADDC(sum, t0)           ADDC(t0, t1)
>>     ADDC(sum, t1)           ADDC(t2, t3)
>>     ADDC(sum, t2)           ADDC(sum, t0)
>>     ADDC(sum, t3)           ADDC(sum, t2)
>>
>> With this patch applied, ADDC and the **next next** ADDC are independent.
>
> This is interesting because even CPUs as old as the R2000 have a pipeline
> bypass which allows an instruction to use a result written to a register
> by an immediately preceeeding instruction.

But if removes some dependency(as the patch did), instruction A and
instruction B can be issued at the same cycle[1], instead of B waiting
for the result from A   (a pipeline bypass reduces the wait time, but
not eliminates it, right?)

>
> Can you explain why this patch is so beneficial for Loongson 3A?

I have written a simply test[2] to measure the performance gain on
Loongson 3A, the result[3] shows at most 50% performance gain.

IMHO, the patch not only benefits Loongson 3A, but would also benefit
other MIPS CPU(s).

--
1. If the hardware supports this, e.g. at least two ALU units for ALU
operations, and is an out of order execution pipeline, etc
2. http://dev.lemote.com/files/upload/software/csum-opti/csum-test.tar.gz
3. http://dev.lemote.com/files/upload/software/csum-opti/csum-opti-benchmark.html

Regards,

- cee1