C coding practices

"James K. Lowden" <jklowden@xxxxxxxxxxxxxxx> · Thu, 7 Mar 2019 15:57:56 -0500

Hello everyone, 

There are a few style/standards practices used in the libcob C code that
I find questionable.  I have found no written rationale for them.  I'd
like to understand their purpose and document them, or eliminate them.

Topics:

1.  Unnecessary "portability"
2.  Unnecessary casts
3.  Pervasive likely/unlikely
4.  Overuse of macros

Portability

Some efforts at portability are probably no longer unnecessary, if
they ever were. For example:

    #if defined (_MSC_VER) && COB_USE_VC2008_OR_GREATER
    #define ONCE_COB \
    	__pragma( warning(push) )		\
    	__pragma( warning(disable:4127) )	\
    	while (0) \
    	__pragma( warning(pop) )
    #else
    #define ONCE_COB while (0)
    #endif

According to the C standard, any unrecognized pragma is ignored.  If the
above code used #pragma instead of __pragma, the code should compile
regardless of Windows.  But, I wonder, is warning 4127 of any use
anyway?  Maybe it should be disabled globally.  I've never seen "do {}
while 0" macros guarded in other code bases.

If we want the warning, can we at least see if #pragma can be used
instead?  Which supported compiler produces a diagnostic for
unrecognized pragmas?  

Along the same lines, but not a banner I want to march under today, is
the nefarious Microsoft backslash.  It's not widely known that Win32
functions accept '/' as a path separator.  Logic to substitute '\' in
the name of portability is usually not needed.  If you're parsing the
filename, looking for a separator, you're probably doing it wrong.  

Casts

There are far too many casts.  Casts should be restricted to essential
cases; they should signal exceptional use.  Unnecessary casts clutter
the code and obscure the intent.  Examples;

    putc ('\n', (FILE *)f->file);

Here, f->file is defined as void*.  C converts void* to any pointer
type. Casting to FILE* is pointless.

    memset ((void *)&lock, 0, sizeof (struct flock));

memset takes void* as its first argument.  C converts any pointer to
void*.  Casting to void* is pointless. IMO this is better: 

    memset (&lock, 0, sizeof(lock));

because it remains correct even if the type of "lock" changes. I prefer
"sizeof(variable)" to "sizeof(type)" except when there's no variable 
in play, such as when defining a structure.  

Another example: 

    ret = fwrite (f->record->data, size, (size_t)1, (FILE *)f->file);

fwrites takes size_t as its 3rd argument.  The literal "1" will be
promoted automatically to that type.  The cast is pointless.

A pointless cast can become a dangerous one.  If a function parameter is
changed from void* to foo*, and the caller casts it to void*, type
checking is defeated, and an error likely missed consequently. If the
variable being cast is changed to being defined as const, the cast
silently defeats the const, again quite likely leading to an error.  

Implicit conversion to/from void* is a feature of C, not C++.  If the
cast was introduced to satisfy a C++ compiler, the right choice is to
use a C compiler instead.  

In a similar vein, many casts seem intended to quell warnings about
signed/unsigned comparisons and assignments.  That warning is peculiar
to Microsoft compilers, and I sometimes think they exist just to make C
and C++ harder to use (to sow preference for C#).  While I imagine
there are documented cases of a significant error caused by
using a signed variable in an unsigned context (or vice versa), in the
*vast* majority of cases the warning is a pointless scold.  I would say
its best effect is to encourage the use of size_t for unsigned
magnitudes.

I recommend disabling the warning, removing the casts, and thinking
while programming.  

Likely/unlikely

Almost every branch test looks like this:

    if (unlikely (cobsetptr->cob_ls_nulls != 0)) {

There are 6 definitions for the "unlikely" macro.  The intention is
evidently to help the compiler generate more efficient branch decisions.

    #define likely(x)	__builtin_expect((long int)!!(x), 1L)
    #define unlikely(x)	__builtin_expect((long int)!!(x), 0L)

Given that we know how notoriously bad programmers are at identifying
efficiency concerns a priori, I seriously doubt the efficacy of the
pervasive use of likely/unlikely.  I would therefore like to know: has
it been tested?  If so, what is the measured gain from their use?

If no significant gain has been measured with modern compilers, I would
recommend abandoning the use of likely/unlikely, and mechanically
stripping them from the code.  If they don't add anything, by
definition they constitute unnecessary complexity.

Macros

There are far too many macros used, often to bad effect.  For example:

    for (i = 0; i < (int)size; ++i, ++p) {
    	if (*p < ' ') {
    		COB_CHECKED_PUTC (0, (FILE *)f->file);
    	}
    	COB_CHECKED_PUTC ((*p), (FILE *)f->file);
    }

This may seem like a good idea, but it's not.  The macro returns!

    #define COB_CHECKED_PUTC(character,fstream)	do { \
    		ret = putc ((int)character, fstream); \
    		if (unlikely (ret != (int)character)) { \
    			return errno_cob_sts
(COB_STATUS_30_PERMANENT_ERROR); \ } \
    	} ONCE_COB /* LCOV_EXCL_LINE */

Macros that return violate the Principle of Least Astonishment.
Because they hide the return statement, the person reading the
invocation does not see the flow of control.  

The same effect could be had more idiomatically and clearly with a
normal function call:

    if( 0 != cob_checked_splat(p, size, f->file) ) {
    	return errno_cob_sts (COB_STATUS_30_PERMANENT_ERROR);
    }

where cob_checked_splat() is:

    int cob_checked_splat( void *buf, size_t len, FILE* file ) {
        for ( char *p=buf; p < buf+len; p++ )
            if( EOF == putc(*p < ' '? 0 : *p, file) ) {
    	    	return EOF;
	    }
        }
        return 0;
    }

* putc returns EOF on error, regardless of input
* no cast needed
* no index ("i") needed

This way, the caller controls the return code, and the buffer-writer
controls the loop.  Simpler for both parties, clearer flow of control,
and the function is more reusable than the macro.

Also, I'm skeptical of the "*p < ' '" check.  Do we want to write ASCII
DEL (0x7f)?  I think we may want isprint(3). 

To be clear: I favor portable, standard code.  It has been the
experience of many projects that standard code is portable code; fewer
and fewer workarounds are needed.  Not every warning turned on by
default need be honored.  For example, SQLite is highly portable, but
cannot satisfy the warnings produced by every conceivable compiler.
Sometimes the right answer to "I get a warning" is, "turn off that
warning".  

There may have been issues in the past that are no longer relevant or,
as I said, there may have been decisions rightly made that simply
aren't written down. I think it's worth our while to understand the
difference, and not simply perpetuate unnecessary complexity.  

--jkl