Re: [PATCH] LoongArch: vDSO: Tune the chacha20 implementation

"Jason A. Donenfeld" <Jason@xxxxxxxxx> · Fri, 20 Sep 2024 17:11:05 +0200

On Thu, Sep 19, 2024 at 05:13:59PM +0800, Xi Ruoyao wrote:
> As Christophe pointed out, tuning the chacha20 implementation by
> scheduling the instructions like what GCC does can improve the
> performance.
> 
> The tuning does not introduce too much complexity (basically it's just
> reordering some instructions).  And the tuning does not hurt readibility
> too much: actually the tuned code looks even more similar to a
> textbook-style implementation based on 128-bit vectors.  So overall it's
> a good deal to me.
> 
> Tested with vdso_test_getchacha and benched with vdso_test_getrandom.
> On a LA664 the speedup is 5%, and I expect a larger speedup on LA[2-4]64
> with a lower issue rate.
> 
> Suggested-by: Christophe Leroy <christophe.leroy@xxxxxxxxxx>
> Link: https://lore.kernel.org/all/77655d9e-fc05-4300-8f0d-7b2ad840d091@xxxxxxxxxx/
> Signed-off-by: Xi Ruoyao <xry111@xxxxxxxxxxx>

That seems like a reasonable optimization to me. I'll queue it up in
random.git and send it in my pull next week.

Thanks.

Jason