Re: What CPU extensions can we assume are available by arch?

Richard Shaw <hobbes1069@xxxxxxxxx> · Sun, 26 Apr 2020 07:07:37 -0500

On Sat, Apr 25, 2020 at 7:24 AM Kevin Kofler <kevin.kofler@xxxxxxxxx> wrote:
Richard Shaw wrote:

> As far as LCPNet itself I've communicated with the primary developer quite

> a bit over the last week. LPCNet *will not work* without optimizations (at

> least not in real time which is the point).

Has anyone (upstream or elsewhere) ever looked into doing an SSE2 version of 

the vector code? It should be faster than scalar (especially considering 

that the "scalar" floating-point code (under the default -mfpmath=sse) 

actually loads everything into SSE2 registers as well, but does not actually 

make use of the vectorization) and it would match the baseline of many 

distributions and upstreams out there.

It's funny we just had this conversation yesterday, I woke up to a pull request to add SSE support. 

https://github.com/drowe67/LPCNet/pull/25

TL;DL version. On my Ryzen 5 2600, SSE4.1 barely improved performance with the current LPCNet code. The good news is a beefy processor can perform better than real time without optimizations, but that can't be assumed for everyone. There will be people wanted to run this software on lower end laptops which can't keep up in real time.

Below is a quick table from the PR showing relative decode performance per SIMD pathway:

Fedora 31
gcc 9.3.1
Ryzen 5 2600
SIMD Time (s) % real time
None 19.796 39.8%
SSE 4.1 17.971 36.1%
AVX 10.185 20.5%
AVX2 9.459 19.0%
Thanks,
Richard
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx

SIMD	Time (s)	% real time
None	19.796	39.8%
SSE 4.1	17.971	36.1%
AVX	10.185	20.5%
AVX2	9.459	19.0%