Re: Gotos

Momchil Velikov <velco@fadata.bg> · 09 Jan 2004 10:51:22 +0200

>>>>> "Phil" == Phil White <cerise@littlegreenmen.armory.com> writes:

Phil> This is flame bait.  As a result, I promise only one reply to
Phil> it, should I receive criticism.  That way, it needn't annoy
Phil> everyone else very much ; )

You have rather strange (and IMHO quite unpopular) criteria for
classifying something as a flamebait.

Phil> Actually, if you try compiling that file with the patch and
Phil> without it, they do not turn out the same.  I tried without -O,
Phil> with -O2, and with -O3.

Phil> In all cases (on my x86), it turns out in favor of the goto.

We've heard you opinion.  Looking forward for the arguments ...

Phil> Your reply smacks of a rather distasteful approach towards
Phil> gotos: That they shouldn't ever be used.  Gotos have a definite
Phil> use as Knuth pointed out in his response to Djikstra.

Please, quote the part where I said something against gotos.

Phil> That it doesn't matter is clearly false given this snippet of
Phil> code.  It saves code space.

Phil> Does it make the code any less readable to have it exit at a
Phil> common point?

Phil> In my opinion, I've found multiple exits harder to track down
Phil> than gotos pointing at a common exit point in code.

Phil> The label in this case is well chosen.  "out" is pretty
Phil> obviously leading out of the function.  I don't think this
Phil> qualifies as spaghetti code seeing as how it jumps to a well
Phil> defined label and out.

Phil> As it is now, it is quantifiably better to use gotos here rather
Phil> than returns.  I don't think readability is hurt at all by it.

  Please, stop with the strawmen, ok ? I've never said anything about
the readability of the function.

  As for "quantifiably better", how about defining it ? Do you mind
defining better as either of:

  a) smaller code size
  b) smaller amount of cycles spent, due to instructions
  c) smaller number of cycles spent, due to memory loads/stores

  And note the "either of", in different cases people may prefer to
strive for different one (if they conflict).  Also I make difference
between cycles spent for the reasons in b) or c) because on different
platforms their relative significance differs.

So, let's take a small stand-alone example:

The goto variant (hereafter denoted x1.c):
------------------------------------------

int foo (), bar ();

int
baz ()
{
  int ret = 1;

  if (foo ())
    {
      ret = 0;
      goto out;
    }

  bar ();

 out:
  return ret;
}

The gotoless variant (hereafter denoted x2.c):
------------------------------------------

int foo (), bar ();

int
baz ()
{
  int ret = 1;

  if (foo ())
    {
      ret = 0;
      return ret;
    }

  bar ();

 out:
  return ret;
}

The compiler is:

$ gcc --version
gcc (GCC) 3.3.2

 1. At first compile it with ``-S -O3 -fomit-frame-pointer''

x1.s:
-----
	.file	"x1.c"
	.text
	.p2align 4,,15
.globl baz
	.type	baz, @function
baz:
	subl	$12, %esp
	movl	%ebx, 8(%esp)
	movl	$1, %ebx
	call	foo
	testl	%eax, %eax
	je	.L2
	xorl	%ebx, %ebx
.L3:
	movl	%ebx, %eax
	movl	8(%esp), %ebx
	addl	$12, %esp
	ret
	.p2align 4,,7
.L2:
	call	bar
	jmp	.L3
	.size	baz, .-baz
	.ident	"GCC: (GNU) 3.3.2"

x2.s:
-----
	.file	"x2.c"
	.text
	.p2align 4,,15
.globl baz
	.type	baz, @function
baz:
	subl	$12, %esp
	call	foo
	xorl	%edx, %edx
	testl	%eax, %eax
	je	.L4
.L1:
	movl	%edx, %eax
	addl	$12, %esp
	ret
	.p2align 4,,7
.L4:
.L3:
	call	bar
	movl	$1, %edx
	jmp	.L1
	.size	baz, .-baz
	.ident	"GCC: (GNU) 3.3.2"

  Ok, first thing we note is that there's single epilogue sequence in
both variants.  And the second thing is there are some extra
instructions to save/restore %ebx in the first variant, which are
extra cycles on two accounts - as extra instructions and as extra
memory accesses.

  Summary: a) code size - x2.c (gotoless) is smaller with 2 insn
           b) insn cycles - x2.c executes less insns
           c) memory cycles - x2.c performs less loads/stores.  

  IOW, gotoless variant is "better" according to the above criteria.

2. As we were speaking of code size, let's compile with for code size:
   ``-S -Os -fomit-frame-pointer''. 

x1.s:
-----
	.file	"x1.c"
	.text
.globl baz
	.type	baz, @function
baz:
	pushl	%ebx
	movl	$1, %ebx
	call	foo
	testl	%eax, %eax
	je	.L2
	xorl	%ebx, %ebx
	jmp	.L3
.L2:
	call	bar
.L3:
	movl	%ebx, %eax
	popl	%ebx
	ret
	.size	baz, .-baz
	.ident	"GCC: (GNU) 3.3.2"

x2.s:
-----
	.file	"x2.c"
	.text
.globl baz
	.type	baz, @function
baz:
	call	foo
	xorl	%edx, %edx
	testl	%eax, %eax
	jne	.L1
.L3:
	call	bar
	movl	$1, %edx
.L1:
	movl	%edx, %eax
	ret
	.size	baz, .-baz
	.ident	"GCC: (GNU) 3.3.2"

Again, there's a single epilogue and the goto variant clobbers %ebx
(and thus needs to save/restore it)

Summary: a) code size -  x2.s is smaller by 3 insns
         b) insn cycles - x2.s executes less insns
         c) memory cycles - x2.s executes less memory loads/stores

  IOW, gotoless variant is "better" according to the above criteria.

Let's see what happens on a different architecture.

$ arm-elf-gcc --version
arm-elf-gcc (GCC) 3.3.2

3. Compile with ``-S -O3 -fomit-frame-pointer''

x1.s:
-----
	.file	"x1.c"
	.text
	.align	2
	.global	baz
	.type	baz, %function
baz:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 1, uses_anonymous_args = 0
	mov	ip, sp
	stmfd	sp!, {r4, fp, ip, lr, pc}
	sub	fp, ip, #4
	bl	foo
	cmp	r0, #0
	mov	r4, #1
	movne	r4, #0
	bleq	bar
.L3:
	mov	r0, r4
	ldmea	fp, {r4, fp, sp, pc}
	.size	baz, .-baz
	.ident	"GCC: (GNU) 3.3.2"

x2.s:
-----
	.file	"x2.c"
	.text
	.align	2
	.global	baz
	.type	baz, %function
baz:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 1, uses_anonymous_args = 0
	mov	ip, sp
	stmfd	sp!, {fp, ip, lr, pc}
	sub	fp, ip, #4
	bl	foo
	cmp	r0, #0
	mov	r0, #0
	ldmneea	fp, {fp, sp, pc}
.L3:
	bl	bar
	mov	r0, #1
	ldmea	fp, {fp, sp, pc}
	.size	baz, .-baz
	.ident	"GCC: (GNU) 3.3.2"

  The second variant contains two epilogues, the first variant again
gratuitously clobbers a register, with the following increase in
instruction cycles and memory accesses.

   Summary:  a) code size - both are 10 insns
             b) insn cycles - x2.c is better because in one case it
                executes 7 insn in the other 10 insn, where are x1.c
                executes 10 insns in either case.
             c) memory cycles - x1.c does an extra save/restore of r4

  IOW, gotoless variant is "better" according to the above criteria.

4. Compiling for code size - ``-S -Os -fomit-frame-pointer''

  Produces identical output to the previous.

  Now, what does it all prove ? Of course, it PROVES nothing (just
like your supposed compilations do), it DEMONSTRATES something.

  It demonstrates that compilers are a lot smarter than one may think.
Putting ``goto'' in the source does not necessarily mean the compiler
will generate unconditional jump instruction at that point.  Jump
optimizations (e.g. jump-to-jump elimination) can eliminate gotos
altogether. Basic block reordering can totaly change the "shape" of the
code, as "apparent" from the C source.  Conditional instructions
together with if conversion can eliminate jumps too. Etc., etc. ...

  Anyway, in any case DON'T LIE TO THE COMPILER! There's ONLY ONE
reason people generally write better assembler code than the compiler
-- because they know more about the program than the compiler does.
Thus, help the compiler by telling it more about the program. Avoid
casts.  Use ``restrict''.  Use whatever #pragma's there are.  Use
__attribute__.  Use __builtin_expect . If you want do to a return, do
a return.  The compiler can infer a lot more from a ``return'' than
from a ``goto''.

  And note that this is not against gotos.  They have valid uses, I
personally use it often (almost exclusively as a poor man's exception
handling).

~velco

--
Kernelnewbies: Help each other learn about the Linux kernel.
Archive:       http://mail.nl.linux.org/kernelnewbies/
FAQ:           http://kernelnewbies.org/faq/