Another thing that might be helpful is that you can let gcc decide on the alignment, and then optimize appropriately. Check out what we do with siphash: https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/tree/include/linux/siphash.h#n76 static inline u64 siphash(const void *data, size_t len, const siphash_key_t *key) { #ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS if (!IS_ALIGNED((unsigned long)data, SIPHASH_ALIGNMENT)) return __siphash_unaligned(data, len, key); #endif return ___siphash_aligned(data, len, key); } With this trick, we fall through to the fast alignment-assuming code, if gcc can prove that the address is inlined. This is often the case when passing structs, or when passing buffers that have __aligned(BLOCKSIZE). It proves to be a very useful optimization on some platforms.