Hi Branden, On Sat, Nov 02, 2024 at 05:08:37AM -0500, G. Branden Robinson wrote: > [adding Colin Watson to CC; and the groff list because I started musing] > > Hi Alex, > > At 2024-11-01T21:07:29+0100, Alejandro Colomar wrote: > > > > > -/proc/pid/fdinfo/ \- information about file descriptors > > > > > +.IR /proc/ pid /fdinfo " \- information about file descriptors" > > > > > > > > I wouldn't add formatting here for now. That's something I prefer > > > > to be cautious about, and if we do it, we should do it in a > > > > separate commit. > > > > > > I'll move it to a separate patch. Is the caution due to a lack of > > > test infrastructure? That could be something to get resolved, > > > perhaps through Google summer-of-code and the like. > > > > That change might be controversial. > > Then let those with objections step forward and make them! Sure! But that in itself (and the length of your mail) makes a strong reason to have this in a separate commit. :) I'm not opposed to the change. Only cautious. > > (I may be one of them; see below.) > > > We'd first need to check that all software that reads the NAME section > > would behave well for this. > > Not _all_ software, surely. Anybody can write a craptastic man(7) > scraper, and several have, mainly back when Web 1.0 was going to eat the > world. Most of those have withered on the vine. Ahh, yeah, I committed the same mistake I criticise in others every now and then. $all does not really mean "all". (-Wall, `make all`, ...) I meant all [of which I care], which is basically groff(1) and mandoc(1). :) > This is the _Linux_ man-pages project, so what matters are (1) man page > formatters and (2) man page indexers that GNU/Linux systems actually > use. Where people get nervous with the "NAME" section is because of the > indexer; if one's man(7) _formatter_ can't handle an `IR` call, it > hasn't earned the name. Yup. > > Here's a sample input. > > $ cat /tmp/proc_pid_fdinfo_mini.5 > .TH proc_pid_fdinfo_mini 5 2024-11-02 "example" > .SH Name > .IR /proc/ pid /fdinfo " \- information about file descriptors" > .SH Description > Text text text text. > > Starting with formatters, let's see how they do. > > $ nroff -man /tmp/proc_pid_fdinfo_mini.5 > proc_pid_fdinfo_mini(5) File Formats Manual proc_pid_fdinfo_mini(5) > > Name > /proc/pid/fdinfo - information about file descriptors > > Description > Text text text text. > > example 2024‐11‐02 proc_pid_fdinfo_mini(5) > $ mandoc /tmp/proc_pid_fdinfo_mini.5 | ul > proc_pid_fdinfo_mini(5) File Formats Manual proc_pid_fdinfo_mini(5) > > Name > /proc/pid/fdinfo - information about file descriptors > > Description > Text text text text. > > example 2024-11-02 proc_pid_fdinfo_mini(5) > $ ~/heirloom/bin/nroff -man /tmp/proc_pid_fdinfo_mini.5 | ul > proc_pid_fdinfo_mini(5) File Formats Manual proc_pid_fdinfo_mini(5) > > > > Name > /proc/pid/fdinfo - information about file descriptors > > Description > Text text text text. > > > > example 2024-11-02 proc_pid_fdinfo_mini(5) > $ DWBHOME=~/dwb ~/dwb/bin/nroff -man /tmp/proc_pid_fdinfo_mini.5 | cat -s | ul > > proc_pid_fdinfo_mini(5)example (2024-11-02)roc_pid_fdinfo_mini(5) > > Name > /proc/pid/fdinfo - information about file descriptors > > Description > Text text text text. > > Page 1 (printed 11/2/2024) > > I leave the execution of these to perceive the correct font style > changes as an exercise for the reader, but they all get the > "/proc/pid/fdinfo" line right. > > On GNU/Linux systems, the only man page indexer I know of is Colin > Watson's man-db--specifically, its mandb(8) program. But it's nicely > designed so that the "topic and summary description extraction" task is > delegated to a standalone tool, lexgrog(1), and we can use that. > > $ lexgrog /tmp/proc_pid_fdinfo_mini.5 > /tmp/proc_pid_fdinfo_mini.5: parse failed > > Oh, damn. I wasn't expecting that. Maybe this is what defeats Michael > Kerrisk's scraper with respect to groff's man pages.[1] > > Well, I can find a silver lining here, because it gives me an even > better reason than I had to pitch an idea I've been kicking around for a > while. Why not enhance groff man(7) to support a mode where _it_ will > spit out the "Name"/"NAME" section, and only that, _for_ you? > > This would be as easy as checking for an option, say '-d EXTRACT=Name', > and having the package's "TH" and "SH" macro definitions divert > (literally, with the `di` request) everything _except_ the section of > interest to a diversion that is then never called/output. (This is > similar to an m4 feature known as the "black hole diversion".) Sounds good. And then lexgrog(1) would be a one-liner that calls groff(1) with the appropriate flag, right? > All of the features necessary to implement this[2] were part of troff as > far as back as the birth of the man(7) package itself. It's not clear > to me why it wasn't done back in the 1980s. Not enough energy of activation, probably, as with most stuff. > lexgrog(1) itself will of course have to stay around for years to come, You can make it a wrapper around groff(1) with flags, no? > but this could take a significant distraction off of Colin's plate--I > believe I have seen him grumble about how much *roff syntax he has to > parse to have the feature be workable, and that's without upstart groff > maintainers exploring up to every boundary that existed even in 1979 and > cheerfully exercising their findings in man pages. > > I also of course have ideas for generalizing the feature, so that you > can request any (sub)section by name, and, with a bit more ambition,[4] > paragraph tags (`TP`) too. > > So you could do things like: > > nroff -man -d EXTRACT="RETURN VALUE" man3/bsearch.3 I certainly use this. # man_section() prints specific manual page sections (DESCRIPTION, SYNOPSIS, # ...) of all manual pages in a directory (or in a single manual page file). # Usage example: .../man-pages$ man_section man2 SYNOPSIS 'SEE ALSO'; man_section() { if [ $# -lt 2 ]; then >&2 echo "Usage: ${FUNCNAME[0]} <dir> <section>..."; return $EX_USAGE; fi local page="$1"; shift; local sect="$*"; find "$page" -type f \ |xargs wc -l \ |grep -v -e '\b1 ' -e '\btotal\b' \ |awk '{ print $2 }' \ |sort \ |while read -r manpage; do (sed -n '/^\.TH/,/^\.SH/{/^\.SH/!p}' <"$manpage"; for s in $sect; do <"$manpage" \ sed -n \ -e "/^\.SH $s/p" \ -e "/^\.SH $s/,/^\.SH/{/^\.SH/!p}"; done;) \ |mandoc -Tutf8 2>/dev/null \ |col -pbx; done; } # man_lsfunc() prints the name of all C functions declared in the SYNOPSIS # of all manual pages in a directory (or in a single manual page file). # Each name is printed in a separate line # Usage example: .../man-pages$ man_lsfunc man2; man_lsfunc() { if [ $# -lt 1 ]; then >&2 echo "Usage: ${FUNCNAME[0]} <manpage|manNdir>..."; return $EX_USAGE; fi for arg in "$@"; do man_section "$arg" 'SYNOPSIS'; done \ |sed_rm_ccomments \ |pcregrep -Mn '(?s)^ [\w ]+ \**\w+\([\w\s(,)[\]*]*?(...)?\s*\); *$' \ |grep '^[0-9]' \ |sed -E 's/syscall\(SYS_(\w*),?/\1(/' \ |sed -E 's/^[^(]+ \**(\w+)\(.*/\1/' \ |uniq; } # man_lsvar() prints the name of all C variables declared in the SYNOPSIS # of all manual pages in a directory (or in a single manual page file). # Each name is printed in a separate line # Usage example: .../man-pages$ man_lsvar man3; man_lsvar() { if [ $# -lt 1 ]; then >&2 echo "Usage: ${FUNCNAME[0]} <manpage|manNdir>..."; return $EX_USAGE; fi for arg in "$@"; do man_section "$arg" 'SYNOPSIS'; done \ |sed_rm_ccomments \ |pcregrep -Mv '(?s)^ [\w ]+ \**\w+\([\w\s(,)[\]*]+?(...)?\s*\); *$' \ |pcregrep -Mn \ -e '(?s)^ +extern [\w ]+ \**\(\*+[\w ]+\)\([\w\s(,)[\]*]+?\s*\); *$' \ -e '^ +extern [\w ]+ \**[\w ]+; *$' \ |grep '^[0-9]' \ |grep -v 'typedef' \ |sed -E 's/^[0-9]+: +extern [^(]+ \**\(\*+(\w* )?(\w+)\)\(.*/\2/' \ |sed 's/^[0-9]\+: \+extern .* \**\(\w\+\); */\1/' \ |uniq; } Even grepc(1) derived from those scripts. > > and: > > nroff -man -d EXTRACT="OPTIONS/-b" man8/zic.8 While I haven't used this yet, it's probably because it's quite complex to implement with regexes, not because it wouldn't be useful. > > ...does this sound appetizing to anyone? Certainly. > > Also, many other pages might need to be changed accordingly for > > consistency. > > I withdraw the suggestion until lexgrog(1) flexes its own muscles, or > has groff(1) do the lifting. I'm sorry for prompting churn, Ian. > > > No, this isn't outdated, since that reduces the quality of the diff. > > Also, I review a lot of patches in the mail client, without running > > git(1). And it's not just for reviewing diffs, but also for writing > > them. Semantic newlines reduce the amount of work for producing the > > diffs. > > It's a real win for diffs. And diffs are a real win for text. Thus, semantic newlines are a real win for text. "Write poems, not prose." (Any chance we may get that warning added to groff(1)? :D) Cheers, Alex > > Here's a very recent example from groff. > > diff --git a/man/groff.7.man b/man/groff.7.man > index 1fb635f2b..1d248b237 100644 > --- a/man/groff.7.man > +++ b/man/groff.7.man > @@ -1281,6 +1281,7 @@ .SH Identifiers > typeface, > color, > special character or character class, > +hyphenation language code, > environment, > or stream. > . > > > (So recent that in fact I haven't pushed that yet.) > > Lists like the foregoing are common in man pages. > > Regards, > Branden > > [1] https://man7.org/linux/man-pages/dir_by_project.html#groff > [2] String definitions, "string comparisons"[3], and diversions. > [3] strictly, "formatted output comparisons" > > https://www.gnu.org/software/groff/manual/groff.html.node/Operators-in-Conditionals.html > > You can do stricter string comparisons in GNU troff. And I've > thought of some syntactic sugar for performing them that wouldn't > break backward compatibility. > > [4] To really land the feature, we need automatic tag generation from > input text (we don't want to make the man page author construct > their own tags). Another reason we want the construction to be > automatic is to make the tags unique when multiple man pages are > formatted in one run, as one might do when making a book of man > pages. Automatic tagging will also enable the slaying of two other > ancient dragons. > > 1. deep internal links for PDF bookmarks > 2. pod2man's `IX`-happy output; the widespread use of this > nonstandard macro confuses way too many novice page authors, and > bloats document size. > > Another feature we'll really want to do this right is improved string > processing facilities. That, too, is something that will pay > dividends in several areas. With a proper string iterator in the > formatter (and a couple more conditional operators),[5] it will be > possible to write a string library as a macro file, slimming down the > formatter itself a little and making macro writers' lives easier. > We're only two days into the month and this has already come up on > the groff list. > > https://lists.gnu.org/archive/html/groff/2024-11/msg00002.html > > [5] https://savannah.gnu.org/bugs/?62264 -- <https://www.alejandro-colomar.es/>
Attachment:
signature.asc
Description: PGP signature