Re: Casting and dereferencing of pointer

Kalle Olavi Niemitalo <kon@xxxxxx> · Sat, 16 Aug 2008 21:24:08 +0300

sed <sed.nivo@xxxxxxxxx> writes:

> static void _parseItems(const unsigned char *pBuffer)
> {
>   unsigned int itemSize;
>   itemSize = *((unsigned int*)pBuffer); // this give system fault
>   memcpy(&itemSize, pBuffer, sizeof(unsigned int)); // this works well
>   .......
> }

On some processors, you get an exception or just wrong results
if you try to read a value from an address that is not properly
aligned for the type.  For example, a processor can require that
the address of any (signed or unsigned) int is divisible by 4.
Intel 8086 compatibles have historically allowed any alignment,
although even with them the code works faster if the alignment
is right.  GCC has an __alignof__ operator with which you can
check the alignment expected for a given type.

So, the *((unsigned int*)pBuffer) expression works portably only
if the correct alignment is somehow ensured.  The memcpy() call
works regardless of the alignment of the pBuffer pointer.

> static unsigned int hash_obj(struct object *obj, unsigned int n)
> {
> 	unsigned int hash = *(unsigned int *)obj->sha1;
> 	return hash % n;
> }

object.h defines struct object as:

| struct object {
| 	unsigned parsed : 1;
| 	unsigned used : 1;
| 	unsigned type : TYPE_BITS;
| 	unsigned flags : FLAG_BITS;
| 	unsigned char sha1[20];
| };

The structure has first an unsigned int that is split into
bitfields, and then unsigned char sha1[20].  If the compiler
works sensibly, then __alignof__(struct object) is the same as
__alignof__(unsigned int), and offsetof(struct object, sha1) is
sizeof(unsigned int); so if obj points to a correctly aligned
struct object, then obj->sha1 should also be correctly aligned
to be accessed as an unsigned int.

It seems somewhat brittle though: if someone adds a member
between flags and sha1, then the alignment may become wrong.
I think it would be good to add a comment in struct object
about the alignment requirement, or change hash_obj to use
memcpy (which might be slower).

> static int hashtable_index(const unsigned char *sha1)
> {
> 	unsigned int i;
> 	memcpy(&i, sha1, sizeof(unsigned int));
> 	return (int)(i % obj_hash_size);
> }

Here, the sha1 argument does not come from struct object and
might not be sufficiently aligned for unsigned int, so the
memcpy is necessary.  If you look with git blame, you'll find
the commit where the memcpy was added for just this reason.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html