Re: Escaping hyphens ("real" minus signs in groff)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[looping in groff@ because I'm characterizing an unresolved argument and
people may want to dispute my claims]

Hi Michael!

At 2021-01-20T22:03:12+0100, Michael Kerrisk (man-pages) wrote:
> Hi Branden,
> 
> I wonder if I might ask for your input...
> 
> For some time now, man-pages(7) has the text (mostly put there by me):
> 
>    Generating optimal glyphs
>        Where a real minus character is required (e.g., for  numbers  such
>        as  -1,  for  man  page cross references such as utf-8(7), or when
>        writing options that have a leading dash, such as in  ls -l),  use
>        the following form in the man page source:
> 
>            \-
> 
>        This guideline applies also to code examples.
> 
> (You even helped with this text a little, adding the piece about
> manual page cross-references.)
> 
> I'm having some doubts about this text. The doubts were triggered
> after I noticed that many code snippets (inside .EX/.EE blocks) don't
> follow this recommendation. I was about to apply a large patch that
> fixed that when I began to wonder: is it even necessary?

Short answer: yes, I would do that.

Long answer
===========

There are people who would argue (I've heard mostly from BSD people)
that man pages should "DWIM", and always render a "-" as an ASCII 45
hyphen-minus regardless of context, and while we're at it, it should
stop having non-ASCII glyph mappings for `, ', ^, and ~ as well.  I
resist this, as it's contrary to troff's semantics for these characters
since the early 1970s.

My most recent contretemps with people about this can be found starting
here:
	https://lists.gnu.org/archive/html/groff/2020-10/msg00158.html

The former groff maintainer and lead developer, Werner Lemberg, agrees
with me on this point, but some people whose *roff horizons seem to
extend only as far as man pages are passionately opposed.

The issue was not resolved on the groff mailing list and may not ever
be; the instant discussion got derailed by several peoples' fascination
with the Sun Gallant Demi font.  :-/

I share all this because it is a contentious issue and I cannot pretend
to represent my view as a universal consensus.  It is, however, I think,
the opinion shared by people with a fair knowledge of *roff systems and
who perceive the man(7) macro language as an application of a
typesetting system and not an isolated domain-specific language for man
pages.

I got fatigued of the fight before I could share my findings about
historical Unix manuals going back to Version 2.  I get the feeling
people don't really care; they'll happily wield the club of historical
continuity when it works in their favor, and discard it as irrelevant
when it doesn't.  But I can't say _I've_ never been guilty of that
inconsistency...

> Some thoughts/questions:
> 
> * I believe that when rendering to a terminal, the use of "\-" is
> equivalent to just "-"; they both render as a real minus sign (ASCISS
> 055). Right?

It depends on the capabilities of the terminal, and specifically whether
it supports any hyphen, dash, or minus glyphs apart from ASCII 055.
None of ASCII or the ISO 8859 encodings did, and Windows-1252, which
does, is not a popular terminal encoding among Unix/Linux users.

But Unicode also does, and Unicode _is_ popular.  If you write a "raw"
roff document and render it to a UTF-8 terminal, you will be able to see
a difference.

Example:

$ printf "UTF-8 \\-1\n" | groff -Tutf8 | cat -s

Back when people started using UTF-8 terminals, confusion of - and \- in
man pages was even more rampant than it is today, and groff added
directives to the man(7) implementation[1] to deliberately degrade
glyphs to ASCII.

.\" For UTF-8, map some characters conservatively for the sake
.\" of easy cut and paste.
.
.if '\*[.T]'utf8' \{\
.  rchar \- - ' `
.
.  char \- \N'45'
.  char  - \N'45'
.  char  ' \N'39'
.  char  ` \N'96'
.\}

It was intended as a stopgap measure, but thanks to development on groff
slowing down and its maintainer retiring from the role, it's remained
the case for about a decade, and some people now regard the stopgap as
an eternal truth that must be preserved, lest all writers of
documentation defect to Markdown or something.

The above probably should have been placed in the man.local file
instead[2][3], to encourage system administrators to make transitions
away from the stopgap as their sites or distributions deemed suitable.
I have proposed this very thing for the next groff release, 1.23.0, but
even that met with stiff resistance from the BSD camp.  They want cement
poured over the code snippet above.

> * When rendering to PDF, then "\-" and "-" certainly produce different
> results: the former produces a long dash, while the other produces a
> rather short dash.

Yes.  Specifically, the issue depends on whether a _font_ distinguishes
a hyphen from a minus sign.  (To a typographer, there's _no such thing_
as a "hyphen-minus", the ISO name for ASCII 055--or at least there
wasn't until computer character encodings forced compromises onto the
world.) But matters are made muddy by the fact that terminal emulators
impose another layer between the typesetter (*roff) and the fonts used
to draw glyphs.  groff's solution is to use the encoding of the locale
as a proxy for font coverage, which works well only if the font has
coverage for all the glyphs of interest to a document.  Over time this
has become increasingly true for fonts widely used in terminal emulators
and glyphs commonly encountered in practical documents like man
pages.[4]

> Certainly, when writing say "-1" in running text (i.e., not in a
> .EX/.EE code example), one should use "\-1", since without the "\",
> the dash in front of the "1" is rather anaemically small when rendered
> in PDF.

Yes.

> The same is true when writing options strings such as "ls -l". We
> should use "ls \-l" to avoid an anaemic hyphen in PDF.

Yes.

> When writing man-pages xrefs (e.g., utf-8), the use of "\-" produces a
> dash that is almost too long for my taste, but is preferable to the
> result from using "-", where the rendered dash is too small.

I share your discomfort with the length of the dash in man page xrefs,
and also your assessment that it's the lesser evil.

Another issue to consider is that as PDF rendering technology has
improved on Linux, it has become possible to copy and paste from PDF
documents into a terminal window.  In my opinion we should make this
work as well as we can.  Expert Linux users may not ever do this,
wondering why anyone would ever try; new Linux users will quite
reasonably expect to be able to do it.

> Inside code blocks (.EX/.EE) is there any reason to use "\-" rather
> than just "-"? Long ago I think I convinced myself that "\-" should be
> used, but now I am not at all sure that it's necessary. Maybe I forgot
> something, and you might remind me why "\-" is needed (and I will make
> sure to add the reason to man-pages(7)).

Yes; the main reason is so that copy-and-paste from code examples in
your man pages will work if people _don't_ use the degraded character
translations in man.local, which are marked as optional.

And I mean copy-and-paste not just from PDF but from a terminal window.

.EX and .EE, originating in the Version 9 Research Unix man macros, are
"semantic" but they don't _do_ very much.  They don't change
character-to-glyph mappings; they change the font family (on typesetter
devices like PDF, not terminals) and turn off filling.

> Are there any other things I've missed with respect to "\-" vs "="?

Probably, but nothing I can think of right now.  <laugh>  It's a vexing
issue.

To get back to the question you originally posed, I think the change you
suggested (to consistently use \- in .EX/.EE regions) is sound, and will
not frustrate correct rendering even on systems that flatten the
distinction between the minus (\-) and hyphen (-) characters.

Please follow up with any further questions and I will do my best to
answer them.

Regards,
Branden

[1] tmac/an-old.tmac
[2] Debian does this in its /etc/groff/man.local:

[...]
  .if n \{\
[...]
  .  \" Debian: Strictly, "-" is a hyphen while "\-" is a minus sign, and the
  .  \" former may not always be rendered in the form expected for things like
  .  \" command-line options.  Uncomment this if you want to make sure that
  .  \" manual pages you're writing are clear of this problem.
  .\" uncommented by Branden, 2019-06-16 --GBR
  .   if '\*[.T]'utf8' \
  .     char - \[hy]
  .
  .  \" Debian: "\-" is more commonly used for option dashes than for minus
  .  \" signs in manual pages, so map it to plain "-" for HTML/XHTML output
  .  \" rather than letting it be rendered as "&minus;".
  .  ie '\*[.T]'html' \
  .    char \- \N'45'
  .  el \{\
  .    if '\*[.T]'xhtml' \
  .      char \- \N'45'
  .  \}
  .\}

As you can see, I uncommented my local copy so that I could see if the
wrong glyphs were being used in man pages.  A large part of my work on
groff upstream has been on making the man pages better examples for
other man page writers to follow.

[3] As can be seen from the groff mailing list thread, Ingo Schwarze of
OpenBSD rejects the notion of man.local as a file suitable for site
administrators to customize.  I don't know enough about OpenBSD to
rationalize this view.

[4] To check the coverage of your terminal emulator's font, try the
command "man groff_char".  It contains a specimen of every defined groff
"special character" and in my opinion is a reasonable test of practical
glyph coverage[5].  For man pages, it's probably overpowered, even, but
man pages are merely the leading application of *roff, not the only one.

[5] I've largely rewritten the page for groff 1.23.0 (forthcoming)
because I was unhappy with what I perceived as its lack of clarity.  A
recent snapshot at the man-pages Web site[6] is a useful preview, but
(unless you use something like lynx or w3m) it won't tell you anything
about the glyph coverage of your _terminal_'s font.  In any event, the
glyph repertoire has not changed from groff 1.22.4.

[6] https://man7.org/linux/man-pages/man7/groff_char.7.html

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux