Re: how gcc thinks `char' as signed char or unsigned char ?

John Love-Jensen <eljay@xxxxxxxxx> · Wed, 05 Mar 2008 08:11:22 -0600

Hi Tom,

> Presumably GetByte() would be responsible for ensuring it's range is
> limited to 0..255 or -128..127, so the &0xFF is not required.

GetByte returns a byte (a char), which does not specify whether the range is
0..255 or -128..127, so the &0xFF is required.

> b < 200 would work just fine if GetByte() were spec'e properly.

GetByte is spec'd properly.

> It's more apt to think of "char" as a small integer type, not a
> "character type."  You could have a platform where char and int are the
> same size for instance.

I have worked on a platform where char are 32-bit, and another platform
where char are 13-bit.  But that was quite a while ago.

It's more apt to think of char as holding character data, and that a byte
holds a byte (not a small integer).  And that the (b & 0xFF) converts the
byte into an int with a constrained range of 0..255.

> In two's compliment it doesn't really matter, unless you multiply the
> type.  You'd get a signed multiplication instead of unsigned.  Which
> probably won't matter, but it's good to be explicit.  Also shifting
> works differently.  unsigned right shifts fill with zeros.  signed
> shifts may fill with zeros OR the sign bit.

The (b & 0xFF) will work correctly on 1's complement machines, and 2's
complement machines.

If the desire is to have a byte (the typedef'd identifier) represent an
octet, it will also work correctly on machines with greater than 8-bit
bytes.  (In which case it may be more appropriate to use typedef char octet;
as the identifier.)

On platforms where a byte is less than 8-bit, and the desire is for a byte
to represent an octet, the char is not sufficient to hold an octet.

On all the platforms I work on these days, a byte is 8-bit.  I don't expect
that to change any time soon.

> It's IMO a better idea to use the unsigned type (that's why it exists)
> and write your functions so that their domains and co-domains are well
> understood.  If you can't figure out what the inputs to a function
> should be, it's clearly not clearly clearly documented.  ;-)

For a type that holds small integers, an unsigned char (or uint8_t from
<stdint.h>) is appropriate.

For a type that holds a byte, a typedef char byte; without regard to
signed-ness or unsigned-ness is appropriate.

For conversion of a byte to an unsigned char (or uint8_t) which is a small
integer, a (b & 0xFF) is appropriate.

IMO.  YMMV.

Sincerely,
--Eljay