On Sun, Jan 14, 2018 at 10:40:36PM +0100, Arnd Bergmann wrote: > Right. I've done some more investigation anyway, starting over with the > analysis of the gcc options that change it. I've found now that turning > off '-fcode-hoisting' but leaving on the other options I had suspected > earlier (-O2 instead of -Os, -ftree-sra, -ftree-pre) also fixes the > stack problem, and appears to result in the best performance so > far. Oh nice! > I need to rerun the whole test matrix, but that seems rather > promising, and the result may also help debug what's really happening. -fcode-hoisting moves all expression evaluation to as early as possible; for this AES code that means it will increase register pressure a lot, causing a lot of spilling (well, that is my guess). If that is so, then we need to dial down -fcode-hoisting a bit, maybe make it aware of register pressure. Glad you found a smoking gun, Segher