Hi, The other day I wrote a few routines in assembler (using WIN64 calling convention). It was something more like writing the code in C, compiling it with gcc, then doing `objdump -D a.out | less`, taking the code and making the necessary changes (save/restore %rdi, %rsi upon enter/leave). All was great. Still, in my search for speed I noticed that gcc generated a lot of suff like: ... .data 16 .data 16 nop nop ... which is the result of ".p2align 4,,15" (on the net, aparently this is and I quote "like a "turbo" switch on some benchmarks"). I said to myself: "good to know" and did the necessary changes in my "*.S" files. Indeed, what was before a nasty unaligned code, now it's nicely put at a 16byte boundary. However, to my disapointment, this did not make the code run faster :(. "Au contraire", it made it run slower. So why is gcc using it? Or am I missing something? I've tested this on an AMD64 (Turion @ 2.2GHz) machine. -- Mihai Donțu