Re: gcc structures

Donald R Laster Jr <laster@xxxxxxxxxxx> · Tue, 10 Sep 2013 15:02:13 -0400

  Here is some more information on structure alignments and a little bit of history based upon my experience with 8 to 10 different vendor/versions of "C" and C++ compilers over the years.

  The alignment issue is one reason why when I am creating structures in "C" and C++ I pay close attention to the sizes of the variables in the structure - especially arrays and other structures.  If the data is going to be written to files or being transmitted between systems it becomes even more important.  CORBA and other transport mechanism will hide much of these issue from the application programmer in some cases.

  What I have found is all of the compilers will generally place variables on the required or expected alignment based upon the variable type (int, float, double) by default and hardware machine type.  If the structure remains local to the system in memory you would normally never notice any issues.  If you start moving the data around, into files or to different systems, you can run into problems.  If you were to use "sizeof(struct name)" instead of 100, you would probably not have noticed the issue since the size of the data being read and written would have included the alignment fullword (4 bytes) between g1 and g2.  Only if you looked at the size of the file would you have had a question.

  In some cases I have manually placed alignment variables to insure the alignment placement is obvious.  It becomes very important when dealing with low level data accesses from files or across a network.  I have seen code written that transmits a structure across a network to a system with a different alignment requirement and the results on the other end are not what is expected.  In some cases the size of the data transmitted is less or more than the other end expected.  The person writing the code counted the size of the individual variables and arrays instead of the actual size of the structure.  Debugging the problem initially was not easy for various reasons.

  Consider this structure 

      struct  words {		/*	A		B  */
        signed long int   int4;	/*	0		0  */
        double            flt8;	/*	4		8  */
      };			/*size 12	  size 16  */

It may be 12 bytes long or 16 bytes long depending upon the architectures I have used.  On older 32 bit hardware (IBMs, DEC, CCUR (PE), Data General, Gould - think 1980s/early 1990s) platforms the offsets and size are from column A (what you expected I believe).  The requirement was for 4 byte alignment of 32 bit and 64 bit values.  While on newer versions of the compilers and on 64 bit hardware (later IBMS, later CCUR (PE), Sparc v8/v9, etc) the offsets and size are from column B.  Variables that are 8 bytes in size are generally placed on 8 byte boundaries by default today.  It makes it simpler to write compilers and not create problems when moving code to different platforms.  

  The newer 64 bit systems expects variables that are 64 bits (8 bytes) in size (long long's, doubles) to be on 8 byte boundaries or alignment exceptions occur.  Thus the compiler places variables on the natural alignment based upon the size of the variable.  On Intel chips these data types can still be accessed on different alignments (to my knowledge) but the performance drops significantly since the chips have to do extra work to get the data to and from memory and the compiler has to generate more machine code to do the work as well.  

  Older compilers for Intel architectures did not care since the memory accesses were generally byte oriented and it did not matter if it was char (1), halfword (2), fullword (4) or doubleword (8) aligned.  This goes back to the 8086, 80286 and Z80 days.  

  Another thing you need to be aware of is the "Little-Endian" and "Big-Endian" issue.  Especially, if you are moving data across platforms -  such as Intel/AMD to SPARC/IBM or vice-versa.

  Hope this help some.

   Don

David Brown wrote:
> On 08/09/13 15:22, JimJoyce wrote:
>> Thanks, Jonathan, for your speedy reply.
>>
>> However, I'm surprised, That 'C' can pad structures as it sees fit.
>> I thought the point and value of user-defined structures was to suit user's
>> needs.
>> not the whim of the compiler..
> 
> The C compiler is allowed some leeway.  I'm not even sure if it is
> required to keep the struct elements in the same order (there was a gcc
> option to allow it to change the order in certain circumstances, but I
> gather it has now been removed since it worked badly with LTO) - it just
> has to produce code that /acts/ as though the order is as given.
> 
> In particular, the compiler will normally pad elements in a struct as
> necessary to fit the alignment requirements of your architecture.  On
> some architectures, incorrect alignment will mean the program does not
> work - it will either work with incorrect data, or trip a processor
> exception.  On others, incorrect alignment will merely mean the code
> runs slowly.
> 
> When you are concerned about the exact format of your structs, I
> strongly recommend using the "-Wpadded" switch so that the compiler will
> inform you of any added padding bytes.  Then you can adapt your struct
> to fit - possibly by adding explicit "padding" entries to make
> everything fit correctly.
> 
> David
> 
>