On Sat, Apr 25, 2020 at 7:24 AM Kevin Kofler <kevin.kofler@xxxxxxxxx> wrote:
Richard Shaw wrote:
> As far as LCPNet itself I've communicated with the primary developer quite
> a bit over the last week. LPCNet *will not work* without optimizations (at
> least not in real time which is the point).
Has anyone (upstream or elsewhere) ever looked into doing an SSE2 version of
the vector code? It should be faster than scalar (especially considering
that the "scalar" floating-point code (under the default -mfpmath=sse)
actually loads everything into SSE2 registers as well, but does not actually
make use of the vectorization) and it would match the baseline of many
distributions and upstreams out there.
It's funny we just had this conversation yesterday, I woke up to a pull request to add SSE support.
TL;DL version. On my Ryzen 5 2600, SSE4.1 barely improved performance with the current LPCNet code. The good news is a beefy processor can perform better than real time without optimizations, but that can't be assumed for everyone. There will be people wanted to run this software on lower end laptops which can't keep up in real time.
Below is a quick table from the PR showing relative decode performance per SIMD pathway:
- Fedora 31
- gcc 9.3.1
- Ryzen 5 2600
SIMD | Time (s) | % real time |
---|---|---|
None | 19.796 | 39.8% |
SSE 4.1 | 17.971 | 36.1% |
AVX | 10.185 | 20.5% |
AVX2 | 9.459 | 19.0% |
Thanks,
Richard
_______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx