Re: [RFC] Convert builin-mailinfo.c to use The Better String Library.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



David Kastrup wrote:
Walter Bright <boost@xxxxxxxxxxxxxxx> writes:

A canonical example is that of a loop. Consider a simple C loop over
an array:

void foo(int array[10])
{
    for (int i = 0; i < 10; i++)
    {   int value = array[i];
        ... do something ...
    }
}

It's simple, but it has a lot of problems:

1) i should be size_t, not int

Wrong.  size_t is for holding the size of memory objects in bytes, not
in terms of indices.  For indices, the best variable is of the same
type as the declared index maximum size, so here it is typeof(10),
namely int.

The easiest way to show the error is consider the code being ported to a typical 64 bit C compiler. int's are still 32 bits, yet the array can be larger than 32 bits. You're right in that what we want to be able to do is typeof(array dimension), but there is no way to do that automatically in C, which is my point. If the array dimension changes, you have to carefully check to make sure every loop dependency on the type is updated, too.

size_t will always work, however, making it a better choice than int, at least for C.

2) array is not checked for overflow

Why should it?

Because the 10 array dimension is not statically checked in C. I could pass it a pointer to 3 ints without the compiler complaining. This makes it a potential maintenance problem. Also, the maintenance programmer may change the array dimension in the function signature, but overlook changing it in the for loop. Again, a maintenance problem.


3) 10 may not be the actual array dimension

Your point is?

Array buffer overflow errors are commonplace in C, because array dimensions are not automatically checked at either compile or run time. This is an expensive problem. Some C APIs try to deal with this by passing a second argument for arrays giving the dimension (snprintf, for example), but this tends to be sporadic, not conventional. It being extra work for the programmer inevitably means it doesn't get done.


4) may be more efficient to step through the array with pointers,
rather than indices

No.  It is a beginners' and advanced users' mistake to think using
pointers for access is a good idea.  Trivial optimizations are what a
compiler is best at, not the user.  Using pointer manipulation will
more often than not break loop unrolling, loop reversal, strength
reduction and other things.

C compilers vary widely in the optimizations they'll do for simple loops. I see often enough attempts by programmers to take such matters into their own hands. I agree with you on that - and suggest the language should not tempt the user to do such optimizations.

5) type of array may change, but the type of value may not get
updated

Huh?

Let's say our fearless maintenance programmer decides to make it an array of longs, not an array of ints. He overlooks changing the type of value in the loop. Suddenly, things subtly break because of overflows. Or maybe he changed the int to an unsigned, now the divides in the loop give different answers. Etc. There really isn't any compiler/language help in finding these kinds of problems.


6) crashes if array is NULL

Certainly.  Your point being?

I consider an array that is NULL to have no members, so instead of crashing the loop should execute 0 times.


7) only works with arrays and pointers

Since there are only arrays and pointers in C, not really a restriction.

C has structs, too, as well as more complicated user defined collections. Essentially, you cannot (simply) write generic algorithms in C, because you cannot (simply) generically express iteration. Of course, you can still express anything in C if you're willing to work hard enough to get it. Me, I'm too lazy <g>. It's like why I can't play chess - everytime I try to play it instead I think about writing a program to do the hard work for me.


As a programmer, I'm specifying exactly what I want to happen without
much extra puffery. It's less typing, simpler, and more resistant to
bugs.

1) correct loop index type is selected based on the type of array
2) arrays carry with them their dimension, so foreach is guaranteed to
step through the loop the correct number of times
3) implementation decides if pointers will do a better job than
indices, based on the compilation target
4) type of value is inferred automatically from the type of array, so
no worries if the type changes
5) Null arrays have 0 length, so no crashing
6) works with any collection type

Most of those are toy concerns.  They prevent problems that don't
actually occur much in practice.

I beg to differ - buffer overflow bugs are common and expensive. The nice thing about the D loop is it is LESS typing than the C one - you get the extra robustness for free.

Let's look at the code gen for the inner loop for C:

L8:             push    [EBX*4][ESI]
                call    near ptr _bar
                inc     EBX
                add     ESP,4
                cmp     EBX,0Ah
                jb      L8

and for D:

LE:            mov     EAX,[EBX]
               call    near ptr _D4test3barFiZv
               add     EBX,4
               cmp     EBX,ESI
               jb      LE

I think you can see that performance isn't an impediment.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux