Re: Using C23 digit separators not locale digit grouping characters

Alejandro Colomar <alx.manpages@xxxxxxxxx> · Sun, 29 Jan 2023 22:19:49 +0100

Hi Brian, Branden,

On 1/29/23 22:04, Brian Inglis wrote:
On 2023-01-29 07:38, Alejandro Colomar wrote:
On 1/28/23 21:40, Brian Inglis wrote:
Seeing the recent tv_nsec patches drop the standard locale digit grouping 
characters "," from the member range [0-999,999,999] made me regret the loss 
of the punctuation which provides better and quicker comprehension of long 
strings of digits.

Nice! Didn't remember about that separator.  It makes a lot of sense to use it 
in comments and the likes in the pages.  Maybe we should be a bit more 
cautious in source code examples, but definitely for big numbers outside of 
running code should have them.
The major compilers support them from draft C23, and the code is in examples, 
not source that has to compile on older compilers, so not much to be concerned 
about there, although some more opinions would be helpful.

My version of gcc only supports it if you specify -std=c2x or -std=gnu2x.  It 
hasn't been backported to -std=gnu17 (the default) so far, AFAICS.

$ cc -Wall -Wextra quote.c
quote.c: In function ‘main’:
quote.c:5:18: warning: multi-character character constant [-Wmultichar]
    5 |         int x = 1'23'4;
      |                  ^~~~
quote.c:5:18: error: expected ‘,’ or ‘;’ before '\x3233'
$ cc -Wall -Wextra quote.c -std=gnu2x
$

Since most people would be compiling on default settings, I prefer avoiding 
that.  When c23 is finally released, and GCC switches to gnu23 by default, I'd 
also use it in example programs.  Does it make sense to you?

наб, would you please update your patches with that?  I also have a few
comments that I'll write in a moment in answers to your patches.
It may be time to consider using the locale independent C23 digit
separator characters "'" wherever more than a handful of digits occur,
possibly convert grouping character uses in existing man pages as they are
changed, and specify a future standard policy approach to provide better
and quicker comprehension of long strings of digits: perhaps using a new
digit separator register and glyph escape sequence \*ds \*[ds] \[ds] \(ds
if not in use by base groff?
The sequence for the unslanted single quote is \(aq.
Granted, but would it not be better to consider using a semantic digit separator 
groff man escape sequence, especially in text, whose rendering could be tweaked, 
rather than a generic literal apostrophe quote used everywhere?
If nothing else is proposed and accepted, I will use the generic \(aq, and if 
future changes are required, they can be targeted by digit context.

We have little semantic things in man(7), as opposed to mdoc(7).  I think it 
will be simpler to just use \(aq.

Branden, any opinion?

We could add somewhere in man-pages(7) that decimal numbers should use a 
separator every 3 digits, and hex and binary should use it every 4 digits.
As well as the 3 decimal, 4 binary/hex, we could use yyyy'mm['dd]L for POSIX and 
similar date digit strings, and 0x10'ffff for Unicode code points, 
distinguishing between the Basic and Supplementary Multilingual Plane indices 
and codes, just as examples from what I've seen so far.

I've also noticed a lot of apparently random decimal digit strings that are 
binary powers or close deltas: those would be more comprehensible if rendered in 
text as Ki/Mi/Gi[+/-n], so would that be preferable, using the IEC i suffix to 
avoid ambiguity?

In running text, I'd do it case by case.  In some cases I guess that'll make 
sense.  In others, 2^32 will make more sense...  But yes, big magic fatnums are 
not nice.

As well as the recently modified pages:

 >> clock_getres.2
 >> timer_settime.2
 >> timerfd_create.2
 >> utimensat.2

there appear to be obvious occurrences in only the following pages:

 >> futex.2
 >> read.2
 >> sendfile.2
 >> write.2
 >> mallopt.3
 >> keyrings.7
 >> mq_overview.7
 >> sched.7
 >> time_namespaces.7

but there appear to be about 400 pages with more than 6 decimal digit
strings (some spurious glibc hex commits and address outputs) where it
could perhaps help, such as in POSIX version dates e.g. 2001'12L, and
undoubtedly more with long digit strings in other radixes.
Would you mind preparing a patch for all of those?  If you'll do it, better
wait until we merge наб's patches, to avoid conflicts.
I'll start anyway, need to review over 300 files with over 900 digit strings, 
having cut a bunch more pages with output examples.

Sure.

Any particular subdivision of files patched into git logged patches, by section, 
by type of edit, separate logged patches for files with many edits, or...?

Whatever you prefer, I guess.  I think the first division I'd do is in the kind 
of change, and then in the section within a page where it appears.  But, you 
write it, so I guess you'll find the best separation.  As long as patches are 
consistent enough to not have many context switches when reviewing, it should be 
good.

FYI although many hits are likely output, the top candidates so far are:

80 man5/proc.5
55 man2/statfs.2
34 man7/feature_test_macros.7
32 man3/dl_iterate_phdr.3
30 man7/units.7
30 man5/rpc.5
23 man3/termios.3
20 man3/malloc_info.3
17 man2/userfaultfd.2
16 man7/keyrings.7
15 man7/time_namespaces.7
14 man7/posixoptions.7
14 man3/mallopt.3
13 man7/utf-8.7
12 man2/reboot.2
12 man2/keyctl.2

Cheers,

Alex
--
<http://www.alejandro-colomar.es/>
Attachment:
OpenPGP_signature

Description: OpenPGP digital signature