Re: Issue in man page ascii.7.po

"G. Branden Robinson" <g.branden.robinson@xxxxxxxxx> · Mon, 14 Mar 2022 11:52:59 +1100

Hi Helge,

At 2022-03-13T13:34:22+0100, Helge Kreutzmann wrote:
> Without further ado, the following was found:
> 
> Issue:    In the right table, please add \& markup for end of sentence characters (? ! .) to get proper formatting in other locales. Thanks!

Specifically, what happens is that if the additional inter-sentence
space amount (set with the `ss` request) is not the same as the
inter-word space amount, the columnation of this "table" (not a tbl(1)
table) gets thrown off.

This is an area that has seen significant clarification in the groff
Texinfo manual and other documentation since the 1.22.4 release, so I
ask the reader's indulgence while I quote it.

 -- Request: .ss word-space-size [additional-sentence-space-size]
 -- Register: \n[.ss]
 -- Register: \n[.sss]
     Set the sizes of spaces between words and sentences.(1)  (*note
     Manipulating Filling and Adjustment-Footnote-1::) Their units are
     twelfths of the space width of the current font.  Initially both
     the WORD-SPACE-SIZE and ADDITIONAL-SENTENCE-SPACE-SIZE are 12.
     Negative values are not permitted.  The request is ignored if there
     are no arguments.

     The first argument, the inter-word space size, is a minimum; if an
     output line undergoes adjustment, such spaces may increase in
     width.

     The optional second argument sets the amount of additional space
     separating sentences on the same output line.  If omitted, this
     amount is set to WORD-SPACE-SIZE.

     The read-only registers '.ss' and '.sss' hold the values of minimal
     inter-word space and additional inter-sentence space, respectively.
     These parameters are associated with the environment (*note
     Environments::), and rounded down to the nearest multiple of 12 on
     terminal output devices.

     Additional inter-sentence spacing is used only if the output line
     is not full when the end of a sentence occurs in the input.  If a
     sentence ends at the end of an input line, then both an inter-word
     space and an inter-sentence space are added to the output; if two
     spaces follow the end of a sentence in the middle of an input line,
     then the second space becomes an inter-sentence space in the
     output.  Additional inter-sentence space is not adjusted, but the
     inter-word space that always precedes it may be.  Further input
     spaces after the second, if present, are adjusted as normal.
[...]
   (1) *Note Filling:: and *note Sentences:: for the definitions of word
and sentence boundaries, respectively.

> "   2 3 4 5 6 7       30 40 50 60 70 80 90 100 110 120\n"
> " -------------      ---------------------------------\n"
> "0:   0 @ P \\` p     0:    (  2  E<lt>  F  P  Z  d   n   x\n"
> "1: ! 1 A Q a q     1:    )  3  =  G  Q  [  e   o   y\n"
> "2: \" 2 B R b r     2:    *  4  E<gt>  H  R  \\e  f   p   z\n"
> "3: # 3 C S c s     3: !  +  5  ?  I  S  ]  g   q   {\n"
> "4: $ 4 D T d t     4: \"  ,  6  @  J  T  \\(ha  h   r   |\n"
> "5: % 5 E U e u     5: #  -  7  A  K  U  _  i   s   }\n"
> "6: & 6 F V f v     6: $  .  8  B  L  V  \\`  j   t   \\(ti\n"
> "7: \\(aq 7 G W g w     7: %  /  9  C  M  W  a  k   u  DEL\n"
> "8: ( 8 H X h x     8: &  0  :  D  N  X  b  l   v\n"
> "9: ) 9 I Y i y     9: \\(aq  1  ;  E  O  Y  c  m   w\n"
> "A: * : J Z j z\n"
> "B: + ; K [ k {\n"
> "C: , E<lt> L \\e l |\n"
> "D: - = M ] m }\n"
> "E: . E<gt> N \\(ha n \\(ti\n"
> "F: / ? O _ o DEL\n"

The piece of ascii(7) quoted above renders as expected if none of the
groff localization macro files are loaded, and if the user/administrator
has not changed the additional inter-sentence space amount in "troffrc"
or "man.local"--but doing so is supported.  A common preference, and one
shared by the Czech, German, French, Italian[1], and Swedish groff
localization files, is to set additional inter-sentence space to zero
with `.ss 12 0`.

Here is the result.

   Tables                                          │
       For convenience, below are more compact tables in hex and
       decimal.

          2 3 4 5 6 7       30 40 50 60 70 80 90 100 110 120
        -------------      ---------------------------------
       0:   0 @ P ` p     0:    (  2  <  F  P  Z  d   n   x
       1: ! 1 A Q a q     1:    )  3  =  G  Q  [  e   o   y
       2: " 2 B R b r     2:    *  4  >  H  R  \  f   p   z
       3: # 3 C S c s     3: ! +  5  ? I  S  ]  g   q   {
       4: $ 4 D T d t     4: "  ,  6  @  J  T  ^  h   r   |
       5: % 5 E U e u     5: #  -  7  A  K  U  _  i   s   }
       6: & 6 F V f v     6: $  . 8  B  L  V  `  j   t   ~
       7: ' 7 G W g w     7: %  /  9  C  M  W  a  k   u  DEL
       8: ( 8 H X h x     8: &  0  :  D  N  X  b  l   v
       9: ) 9 I Y i y     9: '  1  ;  E  O  Y  c  m   w
       A: * : J Z j z
       B: + ; K [ k {
       C: , < L \ l |
       D: - = M ] m }
       E: . > N ^ n ~
       F: / ? O _ o DEL

(Yes, there is a stray pipe symbol on the same line as the subsection
heading.[2])

I've confirmed that Helge's solution works.  In principle, it is fragile
to locales that have other sentence-ending characters, but I know of no
such locales--none are extant in groff, pending, or requested.
Therefore I'm +1 on this.

Perhaps better changes would be to (1) have the Linux man-pages start
using groff's EX/EE macros for this and (2) change groff's EX/EE macros
to start doing what everyone already thinks they do, and shut off
additional inter-sentence space (temporarily).  These would be
supplemental to the existing proposed fix.  Having the additional `\&`
escape sequences will cause no harm, and might be salutary examples.

I noticed just last night that the iso-8859*(7) man pages have a much
worse problem; they use raw 8-bit characters in the input, which leads
to UTF-8 mojibake and/or confusing and incorrect character names for the
glyphs that appear when you render one ISO 8859 encoding's page on
another.  (man-db man(1) hides this problem, possibly by using its
manconv(1) utility--but man pages should be written so that troff -man
works.)  The correct thing to do is use groff special character escape
sequences; these _name_ the desired glyph and are more robust to
character encoding conversions (albeit requiring use of preconv(1)).

Anyone have thoughts on any of the above?

Regards,
Branden

[1] forthcoming in groff 1.23
[2] This appears to be because the preceding tbl(1) table is too wide
    for 78 columns.  I'll have a look and see if I can tweak it.  Or
    this may be a tbl(1) bug; several have been fixed over the past
    couple of years[3].
[3] https://savannah.gnu.org/bugs/index.php?go_report=Apply&group=groff&func=&set=custom&msort=0&report_id=101&advsrch=0&status_id=3&resolution_id=1&submitted_by=0&assigned_to=0&category_id=109&bug_group_id=0&severity=0&summary=&details=&sumORdet=0&history_search=0&history_field=0&history_event=modified&history_date_dayfd=14&history_date_monthfd=3&history_date_yearfd=2022&chunksz=50&spamscore=5&boxoptionwanted=1#options
Attachment:
signature.asc

Description: PGP signature