Re: groff features for hyperlinked man pages (was: No 6.05/.01 pdf book available)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Alex,

At 2023-08-18T15:50:21+0200, Alejandro Colomar wrote:
> On 2023-08-15 02:50, G. Branden Robinson wrote:
> >>> I just re-read this, and am confused.  '\-' is an ASCII character,
> >>> isn't it?  In fact, all of the Linux man-pages pathnames are
> >>> composed exclusively of ASCII characters, aren't they?
[...]
> > You're thinking about this at the wrong level, Alex.  `\-` is a
> > *roff special character.  Unless converted to something else by
> > character translation or character definition,[7] it goes to the
> > device-independent page description language as a special character
> > too.
[...]
> > It is up to the output device to decide what to do with that.
> > groff's "ascii" and "latin1" output devices put out a U+002D
> > character; its "utf8" device puts out a minus sign, U+2212.  Now,
> > before anyone defecates a brick about the U+2212 not being easily
> > greppable, nor useful for copying and pasting to a shell prompt, the
> > man(7) and mdoc(7) macro packages override that.
> 
> So, \- is kept as a special character, even in man(7), until output
> drivers translate it to ASCII -?

In *roff, any character, ordinary or special, can be "translated" to any
other with the `tr` request.

.tr AB \" translate "A" to "B"
.tr -\- \" translate ordinary char '-' to special char '-'
.tr \[aq]' \" translate special char 'aq' to ordinary char "'"

The resemblance to Unix tr(1) is not coincidental.

In GNU troff, context-dependent translations are available (for fairly
specialized purposes--`trin` and `trnt`).  Beneath that,[1] you can
_redefine_ any ordinary or special character.

The formatter applies character translations (and, in GNU troff,
definitions) before producing output.

> Or which program does the translation?

Output devices can perform translations too.  In the above example, "'"
doesn't "remain" "'"; if the output device has directional single
quotes, groff's font descriptions will assign it to the glyph for U+2019
or similar.

Some time perusing the 1.23.0 groff_char(7) and groff_font(5) man pages
will be rewarded.  I hope one day soon to revise groff_out(5) and the
"Using Symbols" section of groff's Texinfo manual to my
satisfaction--the latter will likely drive updates to groff(7)--and by
then the path from input characters to visible output glyphs should be
completely illuminated.  If you were to call this stuff frustratingly
complex, I'd agree.  Most of the complexity exists for good reasons,
though some of those are historical.  The responsible update of
technical documentation entails unearthing and presenting those reasons.

> If it's gropdf(1) that makes the translation, I guess it will also be
> able to perform the same translation for MR.

The translation of `\-` to `-` specifically for the purpose of writing
PDF metadata (bookmarks) via troff device control commands is _extremely
specialized_.  No man page author should ever have to deal with it.

> If the translation has already been made by troff(1), then gropdf(1)
> shouldn't care.  In any case, I still don't see the problem.

If Deri and I do our jobs right, you won't need to care, nor see any
problems.  We're workin' on it.  (Mostly Deri has been, to date.  My
"contribution" has mainly been to look at an.tmac on one hand and the
"pdfhref" macro on the other and stare slack-jawed, wondering how the
hell I'll ever get the impedances to match.  Don't be surprised if
something gets refactored.)

> Indeed.  I won't violently protest to Deri's experiments, as I do
> worse aberrations while experimenting, but I would if this went into
> groff(1).  :)

I don't know what the current state of play with respect to a
four-argument `MR` in Deri's branch is.  I'll let him speak to it.  I'd
prefer not to undertake a code review half-cocked (and ill-prepared,
besides).

> > One of the selling points of `MR` is less typing (no parentheses).
> 
> I woudn't really buy it just for that.  ;)

No indeed, which is why it has much bigger reasons to recommend it,
namely those cited in groff's NEWS file.

  Inclusion of the `MR` macro was prompted by its introduction to
  Plan 9 from User Space's troff in August 2020.  Its purpose is to
  ameliorate several long-standing problems with man page cross
  references: (1) the package's lack of inherent hyperlink support for
  them; (2) false-positive identification of strings resembling man page
  cross references (as can happen with "exit(1)", "while(1)",
  "sleep(5)", "time(0)" and others) by terminal emulators and other
  programs; (3) the unwanted intrusion of hyphens into man page topics,
  which frustrates copy-and-paste operations (this problem has always
  been avoidable through use of the \% escape sequence, but cross
  references are frequent in man pages and some page authors are
  inexpert *roff users); and (4) deep divisions in man page maintenance
  communities over which typeface should be used to set the man page
  topic (italics, roman, or bold).

> In fact, not having a RM variant, it's more typing when (foo(1)).

It's exactly the same.  Check these with `wc -c`.

All your questions are answered elsewhere
.RB ( foo (1) ).

All your questions are answered elsewhere (\c
.MR foo 1 ).

I admit that the mysterious and frightening `\c` escape sequence punches
well above its weight in key strokes, however.  I'm trying to fix that.

groff_man_style(7):

     \c   End a text line without inserting space or attempting a break.
          Normally, if filling is enabled, the end of a text line is
          treated like a space; an output line may be broken there (if
          not, an adjustable space is inserted); if filling is disabled,
          the line will be broken there, as in .EX/.EE examples.  The
          next line is interpreted as usual and can include a macro call
          (contrast with \newline).  \c is useful when three font styles
          are needed in a single word, as in a command synopsis.

               .RB [ \-\-stylesheet=\c
               .IR name ]

          It also helps when changing font styles in .EX/.EE examples,
          since they are not filled.

               .EX
               $ \c
               .B groff \-T utf8 \-Z \c
               .I file \c
               .B | grotty \-i
               .EE

          Alternatively, and perhaps with better portability, the \f
          font selection escape sequence can be used; see below.  Using
          \c to continue a .TP paragraph tag across multiple input lines
          will render incorrectly with groff 1.22.3, mandoc 1.14.1,
          older versions of these programs, and perhaps with some other
          formatters.

I know that's a lot of words, but I want man page writers to use `\c`
without fear (and only) where it is necessary.  Six years ago when I
raised the question on the groff list, no one could explain it to my
satisfaction.  (That may be a "me" problem, not an "everyone else" one.
But other people have expressed unease with `\c` too.)

Incidentally:

A gear finally turned for me just in the past day or two when I realized
that the only reason the man(7) package has font style alternation
macros in the first place was because the *roff `it` request ignores
`\c`.  This was one of the first tricky groff issues I tried to grapple
with.  Years later, I understand that it was a painful botch for `it` to
ignore `\c`, particularly when `ce` did not.[2]

I felt like a fool for not realizing this sooner, but I've never seen
anyone else explicate this point, so maybe I'm not the dunce I think.
(Or I am, but for some other reason. :P )

If Doc Brown loaned me his DeLorean, I'd go back and plead with Joe
Ossanna to fix this in the early days.  If I could drive 88mph only back
to 1979, I'd ask Doug McIlroy to not use input traps in the man(7)
macros at all.

> I'd expect that the hyperlinking ability should be modifyable with
> groff(1) --I don't care at what level of the pipeline--, similar to
> how it was modifiable with man2html(1).  But the source code shouldn't
> know about it.

Please point me to which man2html(1) implementation you mean.[3]  I can
have a look and evaluate.

> I'd like groff(1) to figure out some name that resembles the text used
> as man page reference, or as section heading; I don't want to specify
> it.
[...]
> > Looking farther ahead, I think a further step is required if we're
> > going to have intra-page links; we're going to have to have a way to
> > disambiguate duplicates.  In practice there's not much risk from
> > having duplicate section titles in man pages, but I reckon a big,
> > complex page could duplicate subsection titles.  And if we
> > automatically generate hyperlink tags for paragraph tags, those
> > would likely need it as well.  Maybe representing such internal
> > anchors hierarchically will be enough: "section_subsection_tag" or
> > something like that.
> 
> Yep.  I'd expect something like that.  You could also include the page
> name in a book, which would involve the changes suggested by Deri of
> not having the page title hardcoded as the first level, right?

Yeah, we could put a page identifier (constructed from the first two
arguments to `TH`) as the first component of the anchor.  Would this
even need to be conditional?  Should we simply always do it?

Regards,
Branden

[1] Or, possibly, on top of it.  https://savannah.gnu.org/bugs/?62691

[2]

$ cat /tmp/it-vs-ce.tr
.de AA
BOOM!
..
.
.ie \n(.g .itc 1 AA
.el       .it 1 AA
Tick.  \c
Tick.
.ce 1
Clean up \c
debris.
$ nroff --version | sed 1q
GNU nroff (groff) version 1.23.0
$ nroff /tmp/it-vs-ce.tr | sed 3q
Tick.  Tick.  BOOM!
                        Clean up debris.

$ pdp11
PDP-11 simulator V3.8-1
sim> set cpu 11/45
Disabling XQ
sim> set tto 7b
sim> att rl unix_v7_rl.dsk
sim> boot rl
@boot
New Boot, known devices are hp ht rk rl rp tm vt
: rl(0,0)rl2unix
mem = 177856
# Restricted rights: Use, duplication, or disclosure
is subject to restrictions stated in your contract with
Western Electric Company, Inc.
Thu Sep 22 22:00:20 EDT 1988

login: dmr
$ nroff /tmp/it-vs-ce.tr | sed 3q
Tick.  BOOM!  Tick.
                        Clean up debris.


[3] https://invisible-island.net/scripts/man2html.html#same-name

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux