Re: Using C23 digit separators not locale digit grouping characters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Brian, Branden,

On 1/29/23 22:04, Brian Inglis wrote:
On 2023-01-29 07:38, Alejandro Colomar wrote:
On 1/28/23 21:40, Brian Inglis wrote:
Seeing the recent tv_nsec patches drop the standard locale digit grouping characters "," from the member range [0-999,999,999] made me regret the loss of the punctuation which provides better and quicker comprehension of long strings of digits.

Nice! Didn't remember about that separator.  It makes a lot of sense to use it in comments and the likes in the pages.  Maybe we should be a bit more cautious in source code examples, but definitely for big numbers outside of running code should have them.
The major compilers support them from draft C23, and the code is in examples, not source that has to compile on older compilers, so not much to be concerned about there, although some more opinions would be helpful.

My version of gcc only supports it if you specify -std=c2x or -std=gnu2x. It hasn't been backported to -std=gnu17 (the default) so far, AFAICS.

$ cc -Wall -Wextra quote.c
quote.c: In function ‘main’:
quote.c:5:18: warning: multi-character character constant [-Wmultichar]
    5 |         int x = 1'23'4;
      |                  ^~~~
quote.c:5:18: error: expected ‘,’ or ‘;’ before '\x3233'
$ cc -Wall -Wextra quote.c -std=gnu2x
$


Since most people would be compiling on default settings, I prefer avoiding that. When c23 is finally released, and GCC switches to gnu23 by default, I'd also use it in example programs. Does it make sense to you?


наб, would you please update your patches with that?  I also have a few
comments that I'll write in a moment in answers to your patches.
It may be time to consider using the locale independent C23 digit
separator characters "'" wherever more than a handful of digits occur,
possibly convert grouping character uses in existing man pages as they are
changed, and specify a future standard policy approach to provide better
and quicker comprehension of long strings of digits: perhaps using a new
digit separator register and glyph escape sequence \*ds \*[ds] \[ds] \(ds
if not in use by base groff?
The sequence for the unslanted single quote is \(aq.
Granted, but would it not be better to consider using a semantic digit separator groff man escape sequence, especially in text, whose rendering could be tweaked, rather than a generic literal apostrophe quote used everywhere? If nothing else is proposed and accepted, I will use the generic \(aq, and if future changes are required, they can be targeted by digit context.

We have little semantic things in man(7), as opposed to mdoc(7). I think it will be simpler to just use \(aq.

Branden, any opinion?


We could add somewhere in man-pages(7) that decimal numbers should use a separator every 3 digits, and hex and binary should use it every 4 digits.
As well as the 3 decimal, 4 binary/hex, we could use yyyy'mm['dd]L for POSIX and similar date digit strings, and 0x10'ffff for Unicode code points, distinguishing between the Basic and Supplementary Multilingual Plane indices and codes, just as examples from what I've seen so far.

I've also noticed a lot of apparently random decimal digit strings that are binary powers or close deltas: those would be more comprehensible if rendered in text as Ki/Mi/Gi[+/-n], so would that be preferable, using the IEC i suffix to avoid ambiguity?

In running text, I'd do it case by case. In some cases I guess that'll make sense. In others, 2^32 will make more sense... But yes, big magic fatnums are not nice.


As well as the recently modified pages:

 >> clock_getres.2
 >> timer_settime.2
 >> timerfd_create.2
 >> utimensat.2

there appear to be obvious occurrences in only the following pages:

 >> futex.2
 >> read.2
 >> sendfile.2
 >> write.2
 >> mallopt.3
 >> keyrings.7
 >> mq_overview.7
 >> sched.7
 >> time_namespaces.7

but there appear to be about 400 pages with more than 6 decimal digit
strings (some spurious glibc hex commits and address outputs) where it
could perhaps help, such as in POSIX version dates e.g. 2001'12L, and
undoubtedly more with long digit strings in other radixes.
Would you mind preparing a patch for all of those?  If you'll do it, better
wait until we merge наб's patches, to avoid conflicts.
I'll start anyway, need to review over 300 files with over 900 digit strings, having cut a bunch more pages with output examples.

Sure.


Any particular subdivision of files patched into git logged patches, by section, by type of edit, separate logged patches for files with many edits, or...?

Whatever you prefer, I guess. I think the first division I'd do is in the kind of change, and then in the section within a page where it appears. But, you write it, so I guess you'll find the best separation. As long as patches are consistent enough to not have many context switches when reviewing, it should be good.


FYI although many hits are likely output, the top candidates so far are:

80 man5/proc.5
55 man2/statfs.2
34 man7/feature_test_macros.7
32 man3/dl_iterate_phdr.3
30 man7/units.7
30 man5/rpc.5
23 man3/termios.3
20 man3/malloc_info.3
17 man2/userfaultfd.2
16 man7/keyrings.7
15 man7/time_namespaces.7
14 man7/posixoptions.7
14 man3/mallopt.3
13 man7/utf-8.7
12 man2/reboot.2
12 man2/keyctl.2


Cheers,

Alex
--
<http://www.alejandro-colomar.es/>

Attachment: OpenPGP_signature
Description: OpenPGP digital signature


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux