Re: [PATCH] scanf.3: Do not mention the ERANGE error

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 14/12/2022 11:23, Alejandro Colomar wrote:


On 12/14/22 11:52, Ian Abbott wrote:

'@' isn't included in C's basic character set though.  '&' is available.

Just a curious question from an ignorant:  what's the difference between the basic character set and the source character set?

The source character set may contain locale-specific characters outside the basic source character set.

Actually, there are two basic character sets - the basic source character set and the basic execution character set (which includes the basic source character set plus a few control characters).  The source character set and/or execution character set may contain locale-specific, extended characters outside the basic character set.

https://port70.net/~nsz/c/c11/n1570.html#5.2.1

I still have a small doubt.  C23 added '@' to the source character set, but seems to be a second-class citizen:



  The execution character set may also contain multibyte characters, which
need not have the same encoding as for the source character set. For both character sets, the following
shall hold:
— The basic character set, @, $, and ` shall be present and each character shall be encoded as a
single byte.

What's the difference, and why isn't it part of the basic character set?  Maybe because not all keyboards have those three characters?

I think the inability to type certain characters in the basic source character set is the reason why the language contains the horrible trigraph sequences (no longer valid since the C23 final draft N3054), and the slightly less horrible digraph tokens.

Here is the rationale for inclusion of @ and $ in the source and execution character sets, but ` is only mentioned briefly as an also-ran at the end of the document in section "Do we also want to add ` in the same way as @ and $?":

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2701.htm

The rationale for exclusion of @ and $ characters from the basic character set is given in this paragraph from the document:

"""
By requiring @ and $ in the source and execution character set we, reach the goal of making them useable in comments and string literals. By not adding them to the basic source character set, we protect the freedom of implementations of allowing or disallowing them in identifiers, and avoid inconsistency or incompability regarding the use of universal character names (currently the use of universal character names for characters in the basic source character set is not allowed, so adding characters to the basic source character set without lifting that restriction could break existing code).
"""

I guess it was decided to add all three proposed characters during the Jan/Feb 2022 virtual meeting of WG14 as mentioned here:

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2913.htm

The first C2x draft that incorporated the change is this one:

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2912.pdf

--
-=( Ian Abbott <abbotti@xxxxxxxxx> || MEV Ltd. is a company  )=-
-=( registered in England & Wales.  Regd. number: 02862268.  )=-
-=( Regd. addr.: S11 & 12 Building 67, Europa Business Park, )=-
-=( Bird Hall Lane, STOCKPORT, SK3 0XA, UK. || www.mev.co.uk )=-




[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux