2015-04-06 21:52 GMT+08:00 Maciej W. Rozycki <macro@xxxxxxxxxxxxxx>: > On Mon, 6 Apr 2015, cee1 wrote: > >> > I'm not sure if any such other superscalar MIPS pipeline implementation >> > exists, but if written correctly then at worst it won't hurt anyone else, >> > so just make sure your change does not regress scalar MIPS pipelines. I >> > hope you have a way to verify it. >> >> It seems the P-Class of Warrior generation of MIPS CPU has a >> superscalar MIPS pipeline(http://imgtec.com/mips/warrior/pclass.asp). > > There have been many superscalar MIPS implementations, however I don't > know offhand if any other have the restrictions like yours. Hi, I guess I may not make myself clear :) The example is only showing how this patch removes true data dependency, not implies any restrictions. E.g. ADDC(sum, t0) ADDC(sum, t1) ADDC(sum, t2) ADDC(sum, t3) which are actually following instructions: (1) daddu sum, t0; (2) sltu v1, sum, t0; (3) daddu sum, v1; (4) daddu sum, t1; (5) sltu v1, sum, t1; (6) daddu sum, v1; (7) daddu sum, t2; (8) sltu v1, sum, t2; (9) daddu sum, v1; (10) daddu sum, t3; (11) sltu v1, sum, t3; (12) daddu sum, v1; Here, each instruction depends on the result of its previous instruction, this is tough for any superscalar pipelines. With the patch applied, it becomes: ADDC(t0, t1) ADDC(t2, t3) ADDC(sum, t0) ADDC(sum, t2) which are actually following instructions: (1) daddu t0, t1; (2) sltu v1, t0, t1; (3) daddu t0, v1; (4) daddu t2, t3; (5) sltu v1, t2, t3; (6) daddu t2, v1; (7) daddu sum, t0; (8) sltu v1, sum, t0; (9) daddu sum, v1; (10) daddu sum, t2; (11) sltu v1, sum, t2; (12) daddu sum, v1; Here, e.g. at least (1) and (4) can be issued at the same cycle, as long as CPU has enough execution units and a large enough RS(Reservation Station), fetching instructions quick enough, etc. What I want to say is, this patch removes some ** true data dependency **, hence should improve the performance on (most?) superscalar pipeline implementations. -- Regards, - cee1