Hi list, thanks to the many helpful comments and suggestions here I was able to write the 4x4x4 multiply and even get a somewhat performance out of the code. We have published our results on this here: http://arxiv.org/abs/1203.1692 Thanks again, nick -- To unsubscribe from this list: send the line "unsubscribe linux-assembly" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
- References:
- 4x4 single-precision matrix product with SSE
- From: Nicolas Bock
- Re: 4x4 single-precision matrix product with SSE
- From: Nicolas Bock
- Fwd: 4x4 single-precision matrix product with SSE
- From: Nicolas Bock
- 4x4 single-precision matrix product with SSE
- Prev by Date: Re: [linux-assembly] Declare strings on stack, gas
- Next by Date: [Question] X86 Disassembler Engine Patches -- 3 questions
- Previous by thread: Fwd: 4x4 single-precision matrix product with SSE
- Next by thread: latency and throughput
- Index(es):