unaligned variable may cause split cache line load/store; Which can be fetch by perf event: mem_inst_retired.split_loads,mem_inst_retired.split_stores Is gcc has some compile time option can control variable allocated on stack will aligned? I use this code snippet tested that both default gcc and clang can’t promise this: #include <stdio.h> unsigned long rbp = 0; int main() { asm("movq %%rbp,%0": "=r"(rbp):); char a='c'; int b=1; char e='b'; long c=20; printf("%lu\n%lu, %d\n%lu, %d \n%lu, %d\n%lu, %d", rbp, &a, sizeof(a), &b, sizeof(b), &e, sizeof(e), &c, sizeof(c)); } $./a.out 140733249436400 140733249436159, 1 140733249436152, 4 140733249436151, 1 140733249436136, 8