On Thu, May 6, 2021 at 12:14 PM Jacob Xu <jacobhxu@xxxxxxxxxx> wrote: > > > memset() takes a void *, which it casts to an char, i.e. it works on one byte at > a time. > Huh, TIL. Based on this I'd thought that I don't need a cast at all, > but doing so actually results in a movaps instruction. > I've changed the cast back to (uint8_t *). I'm pretty sure you're just getting lucky. If 'mem' is not 16-byte aligned, the behavior of the code is undefined. The compiler does not have to discard what it can infer about the alignment just because you cast 'mem' to a type with weaker alignment constraints. Why does 'mem' need to have type 'sse_union *'? Why can't it just be declared as 'uint8_t *'? Just add a "memory" clobbers to the inline asm statements that use 'mem' as an SSE operand. Of course, passing it as an argument to sseeq() also implies 16-byte alignment. Perhaps sseeq should take uint32_t pointers as arguments rather than sse_union pointers. I'm not convinced that the sse_union buys us anything other than trouble.