Re: [v5] MIPS: lib: csum_partial: more instruction paral

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



2015-04-06 21:52 GMT+08:00 Maciej W. Rozycki <macro@xxxxxxxxxxxxxx>:
> On Mon, 6 Apr 2015, cee1 wrote:
>
>> >  I'm not sure if any such other superscalar MIPS pipeline implementation
>> > exists, but if written correctly then at worst it won't hurt anyone else,
>> > so just make sure your change does not regress scalar MIPS pipelines.  I
>> > hope you have a way to verify it.
>>
>> It seems the P-Class of Warrior generation of MIPS CPU has a
>> superscalar MIPS pipeline(http://imgtec.com/mips/warrior/pclass.asp).
>
>  There have been many superscalar MIPS implementations, however I don't
> know offhand if any other have the restrictions like yours.

Hi, I guess I may not make myself clear :)

The example is only showing how this patch removes true data
dependency, not implies any restrictions.

E.g.
ADDC(sum, t0)
ADDC(sum, t1)
ADDC(sum, t2)
ADDC(sum, t3)

which are actually following instructions:
(1) daddu     sum, t0;
(2) sltu         v1, sum, t0;
(3) daddu     sum, v1;

(4) daddu     sum, t1;
(5) sltu         v1, sum, t1;
(6) daddu     sum, v1;

(7) daddu     sum, t2;
(8) sltu         v1, sum, t2;
(9) daddu     sum, v1;

(10) daddu     sum, t3;
(11) sltu         v1, sum, t3;
(12) daddu     sum, v1;

Here, each instruction depends on the result of its previous
instruction, this is tough for any superscalar pipelines.


With the patch applied, it becomes:
ADDC(t0, t1)
ADDC(t2, t3)
ADDC(sum, t0)
ADDC(sum, t2)

which are actually following instructions:
(1) daddu     t0, t1;
(2) sltu         v1, t0, t1;
(3) daddu     t0, v1;

(4) daddu     t2, t3;
(5) sltu         v1, t2, t3;
(6) daddu     t2, v1;

(7) daddu     sum, t0;
(8) sltu         v1, sum, t0;
(9) daddu     sum, v1;

(10) daddu     sum, t2;
(11) sltu         v1, sum, t2;
(12) daddu     sum, v1;

Here, e.g. at least (1) and (4) can be issued at the same cycle, as
long as CPU has enough execution units and a large enough
RS(Reservation Station), fetching instructions quick enough, etc.

What I want to say is, this patch removes some ** true data dependency
**, hence should improve the performance on (most?) superscalar
pipeline implementations.



-- 
Regards,

- cee1





[Index of Archives]     [Linux MIPS Home]     [LKML Archive]     [Linux ARM Kernel]     [Linux ARM]     [Linux]     [Git]     [Yosemite News]     [Linux SCSI]     [Linux Hams]

  Powered by Linux