On 14/12/2022 11:23, Alejandro Colomar wrote:
On 12/14/22 11:52, Ian Abbott wrote:
'@' isn't included in C's basic character set though. '&' is
available.
Just a curious question from an ignorant: what's the difference
between the basic character set and the source character set?
The source character set may contain locale-specific characters
outside the basic source character set.
Actually, there are two basic character sets - the basic source
character set and the basic execution character set (which includes
the basic source character set plus a few control characters). The
source character set and/or execution character set may contain
locale-specific, extended characters outside the basic character set.
https://port70.net/~nsz/c/c11/n1570.html#5.2.1
I still have a small doubt. C23 added '@' to the source character set,
but seems to be a second-class citizen:
The execution character set may also contain multibyte characters, which
need not have the same encoding as for the source character set. For
both character sets, the following
shall hold:
— The basic character set, @, $, and ` shall be present and each
character shall be encoded as a
single byte.
What's the difference, and why isn't it part of the basic character
set? Maybe because not all keyboards have those three characters?
I think the inability to type certain characters in the basic source
character set is the reason why the language contains the horrible
trigraph sequences (no longer valid since the C23 final draft N3054),
and the slightly less horrible digraph tokens.
Here is the rationale for inclusion of @ and $ in the source and
execution character sets, but ` is only mentioned briefly as an also-ran
at the end of the document in section "Do we also want to add ` in the
same way as @ and $?":
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2701.htm
The rationale for exclusion of @ and $ characters from the basic
character set is given in this paragraph from the document:
"""
By requiring @ and $ in the source and execution character set we, reach
the goal of making them useable in comments and string literals. By not
adding them to the basic source character set, we protect the freedom of
implementations of allowing or disallowing them in identifiers, and
avoid inconsistency or incompability regarding the use of universal
character names (currently the use of universal character names for
characters in the basic source character set is not allowed, so adding
characters to the basic source character set without lifting that
restriction could break existing code).
"""
I guess it was decided to add all three proposed characters during the
Jan/Feb 2022 virtual meeting of WG14 as mentioned here:
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2913.htm
The first C2x draft that incorporated the change is this one:
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2912.pdf
--
-=( Ian Abbott <abbotti@xxxxxxxxx> || MEV Ltd. is a company )=-
-=( registered in England & Wales. Regd. number: 02862268. )=-
-=( Regd. addr.: S11 & 12 Building 67, Europa Business Park, )=-
-=( Bird Hall Lane, STOCKPORT, SK3 0XA, UK. || www.mev.co.uk )=-