Re: [PATCH] scanf.3: Do not mention the ERANGE error

Ian Abbott <abbotti@xxxxxxxxx> · Wed, 14 Dec 2022 14:10:55 +0000

On 14/12/2022 11:23, Alejandro Colomar wrote:

On 12/14/22 11:52, Ian Abbott wrote:

'@' isn't included in C's basic character set though.  '&' is 
available.

Just a curious question from an ignorant:  what's the difference 
between the basic character set and the source character set?

The source character set may contain locale-specific characters 
outside the basic source character set.

Actually, there are two basic character sets - the basic source 
character set and the basic execution character set (which includes 
the basic source character set plus a few control characters).  The 
source character set and/or execution character set may contain 
locale-specific, extended characters outside the basic character set.

https://port70.net/~nsz/c/c11/n1570.html#5.2.1

I still have a small doubt.  C23 added '@' to the source character set, 
but seems to be a second-class citizen:

  The execution character set may also contain multibyte characters, which
need not have the same encoding as for the source character set. For 
both character sets, the following
shall hold:
— The basic character set, @, $, and ` shall be present and each 
character shall be encoded as a
single byte.

What's the difference, and why isn't it part of the basic character 
set?  Maybe because not all keyboards have those three characters?

I think the inability to type certain characters in the basic source 
character set is the reason why the language contains the horrible 
trigraph sequences (no longer valid since the C23 final draft N3054), 
and the slightly less horrible digraph tokens.

Here is the rationale for inclusion of @ and $ in the source and 
execution character sets, but ` is only mentioned briefly as an also-ran 
at the end of the document in section "Do we also want to add ` in the 
same way as @ and $?":

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2701.htm

The rationale for exclusion of @ and $ characters from the basic 
character set is given in this paragraph from the document:

"""
By requiring @ and $ in the source and execution character set we, reach 
the goal of making them useable in comments and string literals. By not 
adding them to the basic source character set, we protect the freedom of 
implementations of allowing or disallowing them in identifiers, and 
avoid inconsistency or incompability regarding the use of universal 
character names (currently the use of universal character names for 
characters in the basic source character set is not allowed, so adding 
characters to the basic source character set without lifting that 
restriction could break existing code).
"""

I guess it was decided to add all three proposed characters during the 
Jan/Feb 2022 virtual meeting of WG14 as mentioned here:

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2913.htm

The first C2x draft that incorporated the change is this one:

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2912.pdf

--
-=( Ian Abbott <abbotti@xxxxxxxxx> || MEV Ltd. is a company  )=-
-=( registered in England & Wales.  Regd. number: 02862268.  )=-
-=( Regd. addr.: S11 & 12 Building 67, Europa Business Park, )=-
-=( Bird Hall Lane, STOCKPORT, SK3 0XA, UK. || www.mev.co.uk )=-