[CCing groff@gnu list because some problems arise here that merit being findable by search of its list archives] Hi Deri, At 2025-02-17T18:52:46+0000, Deri wrote: > > programs in constructed pipeline: > > > > GNU grops (groff) version 1.23.0.2695-49927 > > GNU troff (groff) version 1.23.0.2695-49927 [...] > Since the v10 pages are intended to run on a version of troff with a > two character name limit (I think). Code such as ".ne4" cause a > problem for groff, which needs ".ne 4" to work (otherwise groff looks > for a macro called "ne4" and fails. Many of these issues are now > corrected. We do have compatibility mode to support old-style AT&T troff input. troff(1): -C Enable AT&T troff compatibility mode; implies -c. See groff_diff(7). However... [skipping ahead] > but changing some "$" to "\[Do]" fixed the problem. ...if you're doing that, you foreclose use of `\[Do]` for 2 reasons. 1. That syntax is a groff extension (the AT&T troff form would be `\(Do`)...but worse... 2. `Do` is not a special character identifier generally recognized by AT&T-family troffs. And there's no way within the AT&T *roff language to define new ones. Fortunately, in Kernighan troff, it's not hard to add them to font description files. As long as you have superuser privileges. > A strange issue is that if a page contained a "$" character it sent > eqn into the stratosphere (thinking was dealing with an inline > equation), I killed it when eqn chewed up over 24gb of memory. I have > no idea why, and it is not triggered by a single page containing a > "$", so it must be triggered by something in an earlier man page which > triggers it, but changing some "$" to "\[Do]" fixed the problem. I surmise that this book building system either runs groff with the `-e` option, or pipes the pages through eqn(1) explicitly, so that every page gets preprocessed by eqn. That's not wrong--in fact it's probably the sanest thing to do--but it does expose you to scenarios like this. I'd bet a U.S. 50-cent piece that some page had this in it: .EQ ... delim $$ ... .EN and then never did this later: .EQ ... delim off ... .EN ...because who ever formats more than one man page at a time? So upon encountering a `$` in an eqnless man page later, the eqn preprocessor would indeed then start gobbling up the entire remainder of the input for attempted conversion to troff input. GNU eqn added an option that strongly mitigates this and another problem: eqn(1): -N Prohibit newlines within delimiters, allowing eqn to recover better from missing closing delimiters. ...and the groff(1) front-end exposes it too, for convenience: groff(1): -N Prohibit newlines between eqn delimiters: pass -N to eqn(1). ...however before reaching for this solution, the corpus of pages being formatted needs to be audited to ensure that no multiline, inline use of eqn is attempted. If it is, the pages must be altered to either: 1. stop doing that--maybe by joining lines--enabling use of `-N`; 2. migrate the "inline" math to EQ/EN bracketing (groff man(7) doesn't define `EQ` and `EN` to set the math as a display, so this _should_ work okay), also enabling use of `-N`; or 3. find the spot where `delim off` should have been and add it. > One page redefined the ".P" man macro, which then affects all > following man pages. Naughty, naughty! I've wondered in the past about adding support for "burning it all down and redefining all interface macros" in groff's "an.tmac" (specifically when hitting a new `TH`).[1] But I decided that people wouldn't believe me that this was a practical hazard. Thanks for pointing me to a real-world case! :D > One page introduced a string register called "mc" which then masks the > groff command ".mc" with very strange results . That's not just a groff request name, but an AT&T one. Hard to imagine how that isn't a bug, or at least a deeply unwise practice. People might want to use {g,}diffmk(1) on man pages, and trashing the mechanism for setting up the margin character defeats such usage. Unfortunately man page authorship culture did not evolve in a direction such that people making changes to the formatter's environment (in the broad sense, not the *roff concept) put things back the way they found them. Approximately every man page is written in the expectation that the formatter will exit once the last line of _this_ man page document is read. Just like how you don't need to bother to free heap-allocated memory in your programs unless you think _you'll_ need it. It's the free store! Grab as much as you want and forget about it! When your process dies the OS will reclaim it all anyway, no harm, no foul. It's no wonder Unix culture produced so many code cowboys. > Font L is used in many entries, no clue what font this is, but I > convert to font CB. Please change to taste (see lines 130 onwards). Good call. `L` (presumably abbreviating "literal") was a latter-day Research Unix convention for font and macro names that I have not seen in materials originating outside the 1980s CSRC. AT&T Documenter's Workbench (~1984-~1994), for example, did not appear to embrace it. > Several pages use lower case macro names, i.e. ".th" rather than > ".TH". Wow. Those could be hangovers from pre-Seventh Edition Unix "man". But I thought Doug McIlroy got all of those ported/rewritten for Seventh Edition. Nevertheless, at least System III,[2] v8, and v10 retained support for Sixth Edition style man pages. For example: $ head -n 5 v8/usr/lib/macros/an '''\" PWB Manual Entry Macros - 1.36 of 11/11/80 '''\" Nroff/Troff Version @(#)1.36 .deth .tmwrong version of man entry macros - use -man6 .ab So be careful out there if you don't want Dave Mustaine to snarl at you! > I have "fixed" a lot of the problems but there are still many warnings > when running groff. I have attached two parthes, one for the V10 man > pages, and one for prepare.pl. You should be able to produce a > "useful" book after applying these. > > If you wish to see the fruits of my labour as a pdf, it is here:- > > http://chuzzlewit.co.uk/UnixV10.pdf This looks really good! It's wonderful to see a working, useful navigation pane, and at least some internal hyperlinks are working. Some aren't, and at a glance it's not obvious to me why. (It's not the first argument to `TH` being in shouting capitals that hoses things, and that's not practiced with 100% reliability anyway--see as80(1) and ld80(1), for example.) In fact those two pages are a weird in a few respects. Obvious spelling errors on the one hand ("moduals"?), and the latter uses a really old Unix manual convention, identifying the section numbers with roman numerals. Where modernization for PDF rendering purposes stops and the Research Tenth Edition Programmer's Manual, Volume 1 editorial effort begins anew may prove a difficult boundary to draw. Regards, Branden [1] One bad approach, IMO, would be to define all interface macros except `TH` _inside_ its own definition. Apart from being super-disruptive for change tracking purposes, since it would touch nearly every line in the macro file, I would expect this to be harder to understand and maintain. Nested macro definitions are fully countenanced by the *roff language but not, I think, a widely mastered technique. Better, I think, would be to define all interface macros using "long names", like `an*SH`, and then have `TH` redeclare the public names as aliases, as in `.als SH an*SH`. Care and testing would be required, as "andoc.tmac" uses the same technique to permit switching between man(7) and mdoc(7) input. I am therefore not in a hurry to pick up this task, even though we do already have automated tests to detect failure of such switching. [2] But not, interestingly, System V. https://github.com/ryanwoodsmall/oldsysv/
Attachment:
signature.asc
Description: PGP signature