[adding Colin Watson to CC; and the groff list because I started musing] Hi Alex, At 2024-11-01T21:07:29+0100, Alejandro Colomar wrote: > > > > -/proc/pid/fdinfo/ \- information about file descriptors > > > > +.IR /proc/ pid /fdinfo " \- information about file descriptors" > > > > > > I wouldn't add formatting here for now. That's something I prefer > > > to be cautious about, and if we do it, we should do it in a > > > separate commit. > > > > I'll move it to a separate patch. Is the caution due to a lack of > > test infrastructure? That could be something to get resolved, > > perhaps through Google summer-of-code and the like. > > That change might be controversial. Then let those with objections step forward and make them! (I may be one of them; see below.) > We'd first need to check that all software that reads the NAME section > would behave well for this. Not _all_ software, surely. Anybody can write a craptastic man(7) scraper, and several have, mainly back when Web 1.0 was going to eat the world. Most of those have withered on the vine. This is the _Linux_ man-pages project, so what matters are (1) man page formatters and (2) man page indexers that GNU/Linux systems actually use. Where people get nervous with the "NAME" section is because of the indexer; if one's man(7) _formatter_ can't handle an `IR` call, it hasn't earned the name. Here's a sample input. $ cat /tmp/proc_pid_fdinfo_mini.5 .TH proc_pid_fdinfo_mini 5 2024-11-02 "example" .SH Name .IR /proc/ pid /fdinfo " \- information about file descriptors" .SH Description Text text text text. Starting with formatters, let's see how they do. $ nroff -man /tmp/proc_pid_fdinfo_mini.5 proc_pid_fdinfo_mini(5) File Formats Manual proc_pid_fdinfo_mini(5) Name /proc/pid/fdinfo - information about file descriptors Description Text text text text. example 2024‐11‐02 proc_pid_fdinfo_mini(5) $ mandoc /tmp/proc_pid_fdinfo_mini.5 | ul proc_pid_fdinfo_mini(5) File Formats Manual proc_pid_fdinfo_mini(5) Name /proc/pid/fdinfo - information about file descriptors Description Text text text text. example 2024-11-02 proc_pid_fdinfo_mini(5) $ ~/heirloom/bin/nroff -man /tmp/proc_pid_fdinfo_mini.5 | ul proc_pid_fdinfo_mini(5) File Formats Manual proc_pid_fdinfo_mini(5) Name /proc/pid/fdinfo - information about file descriptors Description Text text text text. example 2024-11-02 proc_pid_fdinfo_mini(5) $ DWBHOME=~/dwb ~/dwb/bin/nroff -man /tmp/proc_pid_fdinfo_mini.5 | cat -s | ul proc_pid_fdinfo_mini(5)example (2024-11-02)roc_pid_fdinfo_mini(5) Name /proc/pid/fdinfo - information about file descriptors Description Text text text text. Page 1 (printed 11/2/2024) I leave the execution of these to perceive the correct font style changes as an exercise for the reader, but they all get the "/proc/pid/fdinfo" line right. On GNU/Linux systems, the only man page indexer I know of is Colin Watson's man-db--specifically, its mandb(8) program. But it's nicely designed so that the "topic and summary description extraction" task is delegated to a standalone tool, lexgrog(1), and we can use that. $ lexgrog /tmp/proc_pid_fdinfo_mini.5 /tmp/proc_pid_fdinfo_mini.5: parse failed Oh, damn. I wasn't expecting that. Maybe this is what defeats Michael Kerrisk's scraper with respect to groff's man pages.[1] Well, I can find a silver lining here, because it gives me an even better reason than I had to pitch an idea I've been kicking around for a while. Why not enhance groff man(7) to support a mode where _it_ will spit out the "Name"/"NAME" section, and only that, _for_ you? This would be as easy as checking for an option, say '-d EXTRACT=Name', and having the package's "TH" and "SH" macro definitions divert (literally, with the `di` request) everything _except_ the section of interest to a diversion that is then never called/output. (This is similar to an m4 feature known as the "black hole diversion".) All of the features necessary to implement this[2] were part of troff as far as back as the birth of the man(7) package itself. It's not clear to me why it wasn't done back in the 1980s. lexgrog(1) itself will of course have to stay around for years to come, but this could take a significant distraction off of Colin's plate--I believe I have seen him grumble about how much *roff syntax he has to parse to have the feature be workable, and that's without upstart groff maintainers exploring up to every boundary that existed even in 1979 and cheerfully exercising their findings in man pages. I also of course have ideas for generalizing the feature, so that you can request any (sub)section by name, and, with a bit more ambition,[4] paragraph tags (`TP`) too. So you could do things like: nroff -man -d EXTRACT="RETURN VALUE" man3/bsearch.3 and: nroff -man -d EXTRACT="OPTIONS/-b" man8/zic.8 ...does this sound appetizing to anyone? > Also, many other pages might need to be changed accordingly for > consistency. I withdraw the suggestion until lexgrog(1) flexes its own muscles, or has groff(1) do the lifting. I'm sorry for prompting churn, Ian. > No, this isn't outdated, since that reduces the quality of the diff. > Also, I review a lot of patches in the mail client, without running > git(1). And it's not just for reviewing diffs, but also for writing > them. Semantic newlines reduce the amount of work for producing the > diffs. It's a real win for diffs. Here's a very recent example from groff. diff --git a/man/groff.7.man b/man/groff.7.man index 1fb635f2b..1d248b237 100644 --- a/man/groff.7.man +++ b/man/groff.7.man @@ -1281,6 +1281,7 @@ .SH Identifiers typeface, color, special character or character class, +hyphenation language code, environment, or stream. . (So recent that in fact I haven't pushed that yet.) Lists like the foregoing are common in man pages. Regards, Branden [1] https://man7.org/linux/man-pages/dir_by_project.html#groff [2] String definitions, "string comparisons"[3], and diversions. [3] strictly, "formatted output comparisons" https://www.gnu.org/software/groff/manual/groff.html.node/Operators-in-Conditionals.html You can do stricter string comparisons in GNU troff. And I've thought of some syntactic sugar for performing them that wouldn't break backward compatibility. [4] To really land the feature, we need automatic tag generation from input text (we don't want to make the man page author construct their own tags). Another reason we want the construction to be automatic is to make the tags unique when multiple man pages are formatted in one run, as one might do when making a book of man pages. Automatic tagging will also enable the slaying of two other ancient dragons. 1. deep internal links for PDF bookmarks 2. pod2man's `IX`-happy output; the widespread use of this nonstandard macro confuses way too many novice page authors, and bloats document size. Another feature we'll really want to do this right is improved string processing facilities. That, too, is something that will pay dividends in several areas. With a proper string iterator in the formatter (and a couple more conditional operators),[5] it will be possible to write a string library as a macro file, slimming down the formatter itself a little and making macro writers' lives easier. We're only two days into the month and this has already come up on the groff list. https://lists.gnu.org/archive/html/groff/2024-11/msg00002.html [5] https://savannah.gnu.org/bugs/?62264
Attachment:
signature.asc
Description: PGP signature