gcc-help@xxxxxxxxx writes: > When I compile this code: > > #include <stdio.h> > > int main(void) > { > char c = '?; /* ISO-8859-1 0xFC */ > > printf("%c\n", c); > > return 0; > } This source file did not make it through the mail system. I assume that you mean for it to be char c = 'X'; where in the source file X is the single byte 0xfc. > with gcc 4.3.2 under Linux with the locale specifying UTF-8 encoding, > but the source file having ISO-8859-1 encoding, I don't get any > diagnostics, and the output of the printf is a binary 0xFC. I get the > same results if I compile with > > -finput-charset=iso8859-1 -fexec-charset=iso8859-1 > > or > > -finput-charset=utf-8 -fexec-charset=utf-8 The compiler does not attempt to validate the contents of a character constant or string. In these cases you have not asked for any character set conversion, and the compiler has not applied any character set conversion. > My understanding is that gcc should default to UTF-8 source encoding, > and should give a diagnostic when it encounters the illegal UTF-8 start > byte of 0xFC. That is not how the compiler works today. By default, the compiler applies no conversion. That is, it assumes that your input is valid, and it takes it unchanged. > I get the expected diagnostic if I compile with > > -finput-charset=utf-8 -fexec-charset=iso8859-1 > > (converting to execution character set: Invalid argument) There you go. Most people value compilation speed. Most people do not write invalid character strings in their programs. So overall I think gcc is making a sensible choice in not bothering to validate character strings in the input file. Ian