Oh, btw, how bad would it be to just do #define FASTOP_SIZE 16 static_assert(FASTOP_SIZE >= FASTOP_LENGTH) and leave it at that? Afaik both gcc and clang default to -falign-functions=16 *anyway*, and while on 32-bit x86 we have options to minimize alignment, we don't do that on x86-64 afaik. In fact, we have an option to force *bigger* alignment (DEBUG_FORCE_FUNCTION_ALIGN_64B) but not any way to make it less. And we use .p2align 4 in most of our asm, aling with #define __ALIGN .p2align 4, 0x90 So all the *normal* functions already get 16-byte alignment anyway. So yeah, it would be less dense, but do we care? Wouldn't the "this is really simple" be a nice thing? It's not like there are a ton of those fastop functions anyway. 128 of them? Plus 16 of the "setCC" ones? Linus