Hi Alex, At 2023-08-18T15:50:21+0200, Alejandro Colomar wrote: > On 2023-08-15 02:50, G. Branden Robinson wrote: > >>> I just re-read this, and am confused. '\-' is an ASCII character, > >>> isn't it? In fact, all of the Linux man-pages pathnames are > >>> composed exclusively of ASCII characters, aren't they? [...] > > You're thinking about this at the wrong level, Alex. `\-` is a > > *roff special character. Unless converted to something else by > > character translation or character definition,[7] it goes to the > > device-independent page description language as a special character > > too. [...] > > It is up to the output device to decide what to do with that. > > groff's "ascii" and "latin1" output devices put out a U+002D > > character; its "utf8" device puts out a minus sign, U+2212. Now, > > before anyone defecates a brick about the U+2212 not being easily > > greppable, nor useful for copying and pasting to a shell prompt, the > > man(7) and mdoc(7) macro packages override that. > > So, \- is kept as a special character, even in man(7), until output > drivers translate it to ASCII -? In *roff, any character, ordinary or special, can be "translated" to any other with the `tr` request. .tr AB \" translate "A" to "B" .tr -\- \" translate ordinary char '-' to special char '-' .tr \[aq]' \" translate special char 'aq' to ordinary char "'" The resemblance to Unix tr(1) is not coincidental. In GNU troff, context-dependent translations are available (for fairly specialized purposes--`trin` and `trnt`). Beneath that,[1] you can _redefine_ any ordinary or special character. The formatter applies character translations (and, in GNU troff, definitions) before producing output. > Or which program does the translation? Output devices can perform translations too. In the above example, "'" doesn't "remain" "'"; if the output device has directional single quotes, groff's font descriptions will assign it to the glyph for U+2019 or similar. Some time perusing the 1.23.0 groff_char(7) and groff_font(5) man pages will be rewarded. I hope one day soon to revise groff_out(5) and the "Using Symbols" section of groff's Texinfo manual to my satisfaction--the latter will likely drive updates to groff(7)--and by then the path from input characters to visible output glyphs should be completely illuminated. If you were to call this stuff frustratingly complex, I'd agree. Most of the complexity exists for good reasons, though some of those are historical. The responsible update of technical documentation entails unearthing and presenting those reasons. > If it's gropdf(1) that makes the translation, I guess it will also be > able to perform the same translation for MR. The translation of `\-` to `-` specifically for the purpose of writing PDF metadata (bookmarks) via troff device control commands is _extremely specialized_. No man page author should ever have to deal with it. > If the translation has already been made by troff(1), then gropdf(1) > shouldn't care. In any case, I still don't see the problem. If Deri and I do our jobs right, you won't need to care, nor see any problems. We're workin' on it. (Mostly Deri has been, to date. My "contribution" has mainly been to look at an.tmac on one hand and the "pdfhref" macro on the other and stare slack-jawed, wondering how the hell I'll ever get the impedances to match. Don't be surprised if something gets refactored.) > Indeed. I won't violently protest to Deri's experiments, as I do > worse aberrations while experimenting, but I would if this went into > groff(1). :) I don't know what the current state of play with respect to a four-argument `MR` in Deri's branch is. I'll let him speak to it. I'd prefer not to undertake a code review half-cocked (and ill-prepared, besides). > > One of the selling points of `MR` is less typing (no parentheses). > > I woudn't really buy it just for that. ;) No indeed, which is why it has much bigger reasons to recommend it, namely those cited in groff's NEWS file. Inclusion of the `MR` macro was prompted by its introduction to Plan 9 from User Space's troff in August 2020. Its purpose is to ameliorate several long-standing problems with man page cross references: (1) the package's lack of inherent hyperlink support for them; (2) false-positive identification of strings resembling man page cross references (as can happen with "exit(1)", "while(1)", "sleep(5)", "time(0)" and others) by terminal emulators and other programs; (3) the unwanted intrusion of hyphens into man page topics, which frustrates copy-and-paste operations (this problem has always been avoidable through use of the \% escape sequence, but cross references are frequent in man pages and some page authors are inexpert *roff users); and (4) deep divisions in man page maintenance communities over which typeface should be used to set the man page topic (italics, roman, or bold). > In fact, not having a RM variant, it's more typing when (foo(1)). It's exactly the same. Check these with `wc -c`. All your questions are answered elsewhere .RB ( foo (1) ). All your questions are answered elsewhere (\c .MR foo 1 ). I admit that the mysterious and frightening `\c` escape sequence punches well above its weight in key strokes, however. I'm trying to fix that. groff_man_style(7): \c End a text line without inserting space or attempting a break. Normally, if filling is enabled, the end of a text line is treated like a space; an output line may be broken there (if not, an adjustable space is inserted); if filling is disabled, the line will be broken there, as in .EX/.EE examples. The next line is interpreted as usual and can include a macro call (contrast with \newline). \c is useful when three font styles are needed in a single word, as in a command synopsis. .RB [ \-\-stylesheet=\c .IR name ] It also helps when changing font styles in .EX/.EE examples, since they are not filled. .EX $ \c .B groff \-T utf8 \-Z \c .I file \c .B | grotty \-i .EE Alternatively, and perhaps with better portability, the \f font selection escape sequence can be used; see below. Using \c to continue a .TP paragraph tag across multiple input lines will render incorrectly with groff 1.22.3, mandoc 1.14.1, older versions of these programs, and perhaps with some other formatters. I know that's a lot of words, but I want man page writers to use `\c` without fear (and only) where it is necessary. Six years ago when I raised the question on the groff list, no one could explain it to my satisfaction. (That may be a "me" problem, not an "everyone else" one. But other people have expressed unease with `\c` too.) Incidentally: A gear finally turned for me just in the past day or two when I realized that the only reason the man(7) package has font style alternation macros in the first place was because the *roff `it` request ignores `\c`. This was one of the first tricky groff issues I tried to grapple with. Years later, I understand that it was a painful botch for `it` to ignore `\c`, particularly when `ce` did not.[2] I felt like a fool for not realizing this sooner, but I've never seen anyone else explicate this point, so maybe I'm not the dunce I think. (Or I am, but for some other reason. :P ) If Doc Brown loaned me his DeLorean, I'd go back and plead with Joe Ossanna to fix this in the early days. If I could drive 88mph only back to 1979, I'd ask Doug McIlroy to not use input traps in the man(7) macros at all. > I'd expect that the hyperlinking ability should be modifyable with > groff(1) --I don't care at what level of the pipeline--, similar to > how it was modifiable with man2html(1). But the source code shouldn't > know about it. Please point me to which man2html(1) implementation you mean.[3] I can have a look and evaluate. > I'd like groff(1) to figure out some name that resembles the text used > as man page reference, or as section heading; I don't want to specify > it. [...] > > Looking farther ahead, I think a further step is required if we're > > going to have intra-page links; we're going to have to have a way to > > disambiguate duplicates. In practice there's not much risk from > > having duplicate section titles in man pages, but I reckon a big, > > complex page could duplicate subsection titles. And if we > > automatically generate hyperlink tags for paragraph tags, those > > would likely need it as well. Maybe representing such internal > > anchors hierarchically will be enough: "section_subsection_tag" or > > something like that. > > Yep. I'd expect something like that. You could also include the page > name in a book, which would involve the changes suggested by Deri of > not having the page title hardcoded as the first level, right? Yeah, we could put a page identifier (constructed from the first two arguments to `TH`) as the first component of the anchor. Would this even need to be conditional? Should we simply always do it? Regards, Branden [1] Or, possibly, on top of it. https://savannah.gnu.org/bugs/?62691 [2] $ cat /tmp/it-vs-ce.tr .de AA BOOM! .. . .ie \n(.g .itc 1 AA .el .it 1 AA Tick. \c Tick. .ce 1 Clean up \c debris. $ nroff --version | sed 1q GNU nroff (groff) version 1.23.0 $ nroff /tmp/it-vs-ce.tr | sed 3q Tick. Tick. BOOM! Clean up debris. $ pdp11 PDP-11 simulator V3.8-1 sim> set cpu 11/45 Disabling XQ sim> set tto 7b sim> att rl unix_v7_rl.dsk sim> boot rl @boot New Boot, known devices are hp ht rk rl rp tm vt : rl(0,0)rl2unix mem = 177856 # Restricted rights: Use, duplication, or disclosure is subject to restrictions stated in your contract with Western Electric Company, Inc. Thu Sep 22 22:00:20 EDT 1988 login: dmr $ nroff /tmp/it-vs-ce.tr | sed 3q Tick. BOOM! Tick. Clean up debris. [3] https://invisible-island.net/scripts/man2html.html#same-name
Attachment:
signature.asc
Description: PGP signature