Ian Lance Taylor wrote: > Hiroshi Shimamoto <h-shimamoto@xxxxxxxxxxxxx> writes: > >> I noticed that the stack usage of the code gcc-4.x generated looks inefficient >> on x86 and x86_64. I found this looking the assemble code of Linux kernel. >> Is this inefficient stack usage a regression? > > It does seem to be a regression in this case. It seems to be the > result of the tree reassociation pass. That pass reassociates the > trees in order to expose redundancies which can then be eliminated. > Your code ties all the expressions together via the | operation, and > those all get sorted together. This increases the live length of the > operands, and nothing ever fixes it up. > >> I made a simple test case. > > Note that your test case is wrong. > >> #define copy_from_asm(x, addr, err) \ >> asm volatile( \ >> "1:\tmovl %2, %1\n" \ >> "2:\n" \ >> ".section .fixup,\"ax\"\n" \ >> "\txor %1,%1\n" \ >> "\tmov $1,%0\n" \ >> "\tjmp 2b\n" \ >> ".previous\n" \ >> : "=r" (err), "=r" (x) \ >> : "m" (*(int*)(addr))) > > This says that it sets "err", but it doesn't always do so. I modified > the last line to this: > : "m" (*(int*)(addr)), "0" (err)) > which ensures that the register holding 'err' is initialized. Thanks for looking and correction. I'll look what mistake I made, when simplifying this issue. > > Please feel free to report a bug; see http://gcc.gnu.org/bugs.html . Will do. > > Note that your code relies on the fact that the asm does not change > err in the normal case. You will get much better code if you take > advantage of that fact: Thanks for pointing about this, I'm working to change like this. Thanks, Hiroshi