Kevin Kofler <kevin.kofler@xxxxxxxxx> writes: > Has anyone (upstream or elsewhere) ever looked into doing an SSE2 version of > the vector code? It should be faster than scalar (especially considering > that the "scalar" floating-point code (under the default -mfpmath=sse) > actually loads everything into SSE2 registers as well, but does not actually > make use of the vectorization) and it would match the baseline of many > distributions and upstreams out there. What's preventing vectorization with sse2 (or other architecture' base SIMD) anyhow, if anything? Use something like gcc -Ofast -fopt-info-vec-missed on the performance-critical parts for clues. Use -Ofast if you don't care about conforming maths, which you presumably don't, especially if using NEON. _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx