Re: Inefficient stack usage

Ian Lance Taylor <iant@xxxxxxxxxx> · Fri, 12 Dec 2008 17:50:14 -0800

Hiroshi Shimamoto <h-shimamoto@xxxxxxxxxxxxx> writes:

> I noticed that the stack usage of the code gcc-4.x generated looks inefficient
> on x86 and x86_64. I found this looking the assemble code of Linux kernel.
> Is this inefficient stack usage a regression?

It does seem to be a regression in this case.  It seems to be the
result of the tree reassociation pass.  That pass reassociates the
trees in order to expose redundancies which can then be eliminated.
Your code ties all the expressions together via the | operation, and
those all get sorted together.  This increases the live length of the
operands, and nothing ever fixes it up.

> I made a simple test case.

Note that your test case is wrong.

> #define copy_from_asm(x, addr, err)	\
> asm volatile(				\
> 	"1:\tmovl %2, %1\n"		\
> 	"2:\n"				\
> 	".section .fixup,\"ax\"\n"	\
> 	"\txor %1,%1\n"			\
> 	"\tmov $1,%0\n"			\
> 	"\tjmp 2b\n"			\
> 	".previous\n"			\
> 	: "=r" (err), "=r" (x)		\
> 	: "m" (*(int*)(addr)))

This says that it sets "err", but it doesn't always do so.  I modified
the last line to this:
	: "m" (*(int*)(addr)), "0" (err))
which ensures that the register holding 'err' is initialized.

Please feel free to report a bug; see http://gcc.gnu.org/bugs.html .

Note that your code relies on the fact that the asm does not change
err in the normal case.  You will get much better code if you take
advantage of that fact:

#define copy_from(x, addr, err)	do {		\
	copy_from_asm((x), (addr), (err));	\
} while (0)

#define copy(x, addr, err)	({		\
	copy_from((x), (addr), err);	\
})

#define my_copy(x)	do { copy(dst[x], &src[x], err); } while (0)

Ian