Pankaj Kohli writes: > What about this one ? Three PUSHes + sub 0x10,%esp. That makes 28 bytes. > Stack is not aligned on 16-byte boundary in this case. Rather than answer another question on this subject, I'm going to ask you to think a little. The problem we're trying to avoid is that a memory access that straddles two cache lines can be very slow. Have a look at your sample code, and tell me whether you think it might be a problem in this case. Andrew.