Hello list, I am writing an assembly function that multiplies 2 4x4 single precision matrices. I wrote 2 versions, one using SSE the other using SSE4.1. What surprised me is that the SSE4.1 version fails to beat the SSE version, it is in fact slightly slower. Is this the right place to ask for help? If anyone is interested I can post some code which would maybe clarify the situation a bit. If this is not the right place, please ignore me... nick
Attachment:
signature.asc
Description: OpenPGP digital signature
- Follow-Ups:
- Re: 4x4 single-precision matrix product with SSE
- From: Frederic Marmond
- Re: 4x4 single-precision matrix product with SSE
- Prev by Date: Re Linux MIPS DSP-ASE Instructions usage
- Next by Date: Re: 4x4 single-precision matrix product with SSE
- Previous by thread: Re Linux MIPS DSP-ASE Instructions usage
- Next by thread: Re: 4x4 single-precision matrix product with SSE
- Index(es):