> Due to the fact that the x86 port does not support allocating objects > on the stack with an alignment that exceeds 8 bytes, we have a rather > ugly hack in the x86 code for ChaCha to ensure that the state array > is aligned to 16 bytes, allowing the SSE3 implementation of the > algorithm to use aligned loads. > > Given that the performance benefit of using of aligned loads appears > to be limited (~0.25% for 1k blocks using tcrypt on a Corei7-8650U), > and the fact that this hack has leaked into generic ChaCha code, > let's just remove it. Reviewed-by: Martin Willi <martin@xxxxxxxxxxxxxx> Thanks, Martin