Re: [PATCH v2 1/3] proc_pid_fdinfo.5: Reduce indent for most of the page

Alejandro Colomar <alx@xxxxxxxxxx> · Sat, 2 Nov 2024 11:39:37 +0100

Hi Branden,

On Sat, Nov 02, 2024 at 05:08:37AM -0500, G. Branden Robinson wrote:
> [adding Colin Watson to CC; and the groff list because I started musing]
> 
> Hi Alex,
> 
> At 2024-11-01T21:07:29+0100, Alejandro Colomar wrote:
> > > > > -/proc/pid/fdinfo/ \- information about file descriptors
> > > > > +.IR /proc/ pid /fdinfo " \- information about file descriptors"
> > > >
> > > > I wouldn't add formatting here for now.  That's something I prefer
> > > > to be cautious about, and if we do it, we should do it in a
> > > > separate commit.
> > > 
> > > I'll move it to a separate patch. Is the caution due to a lack of
> > > test infrastructure? That could be something to get resolved,
> > > perhaps through Google summer-of-code and the like.
> > 
> > That change might be controversial.
> 
> Then let those with objections step forward and make them!

Sure!  But that in itself (and the length of your mail) makes a strong
reason to have this in a separate commit.  :)

I'm not opposed to the change.  Only cautious.

> 
> (I may be one of them; see below.)
> 
> > We'd first need to check that all software that reads the NAME section
> > would behave well for this.
> 
> Not _all_ software, surely.  Anybody can write a craptastic man(7)
> scraper, and several have, mainly back when Web 1.0 was going to eat the
> world.  Most of those have withered on the vine.

Ahh, yeah, I committed the same mistake I criticise in others every now
and then.  $all does not really mean "all".  (-Wall, `make all`, ...)

I meant all [of which I care], which is basically groff(1) and
mandoc(1).  :)

> This is the _Linux_ man-pages project, so what matters are (1) man page
> formatters and (2) man page indexers that GNU/Linux systems actually
> use.  Where people get nervous with the "NAME" section is because of the
> indexer; if one's man(7) _formatter_ can't handle an `IR` call, it
> hasn't earned the name.

Yup.

> 
> Here's a sample input.
> 
> $ cat /tmp/proc_pid_fdinfo_mini.5
> .TH proc_pid_fdinfo_mini 5 2024-11-02 "example"
> .SH Name
> .IR /proc/ pid /fdinfo " \- information about file descriptors"
> .SH Description
> Text text text text.
> 
> Starting with formatters, let's see how they do.
> 
> $ nroff -man /tmp/proc_pid_fdinfo_mini.5
> proc_pid_fdinfo_mini(5)       File Formats Manual      proc_pid_fdinfo_mini(5)
> 
> Name
>        /proc/pid/fdinfo - information about file descriptors
> 
> Description
>        Text text text text.
> 
> example                           2024‐11‐02           proc_pid_fdinfo_mini(5)
> $ mandoc /tmp/proc_pid_fdinfo_mini.5 | ul
> proc_pid_fdinfo_mini(5)       File Formats Manual      proc_pid_fdinfo_mini(5)
> 
> Name
>        /proc/pid/fdinfo - information about file descriptors
> 
> Description
>        Text text text text.
> 
> example                           2024-11-02           proc_pid_fdinfo_mini(5)
> $ ~/heirloom/bin/nroff -man /tmp/proc_pid_fdinfo_mini.5 | ul
> proc_pid_fdinfo_mini(5)       File Formats Manual      proc_pid_fdinfo_mini(5)
> 
> 
> 
> Name
>        /proc/pid/fdinfo - information about file descriptors
> 
> Description
>        Text text text text.
> 
> 
> 
> example                           2024-11-02           proc_pid_fdinfo_mini(5)
> $ DWBHOME=~/dwb ~/dwb/bin/nroff -man /tmp/proc_pid_fdinfo_mini.5 | cat -s | ul
> 
>        proc_pid_fdinfo_mini(5)example (2024-11-02)roc_pid_fdinfo_mini(5)
> 
>        Name
>             /proc/pid/fdinfo - information about file descriptors
> 
>        Description
>             Text text text text.
> 
>        Page 1                                        (printed 11/2/2024)
> 
> I leave the execution of these to perceive the correct font style
> changes as an exercise for the reader, but they all get the
> "/proc/pid/fdinfo" line right.
> 
> On GNU/Linux systems, the only man page indexer I know of is Colin
> Watson's man-db--specifically, its mandb(8) program.  But it's nicely
> designed so that the "topic and summary description extraction" task is
> delegated to a standalone tool, lexgrog(1), and we can use that.
> 
> $ lexgrog /tmp/proc_pid_fdinfo_mini.5
> /tmp/proc_pid_fdinfo_mini.5: parse failed
> 
> Oh, damn.  I wasn't expecting that.  Maybe this is what defeats Michael
> Kerrisk's scraper with respect to groff's man pages.[1]
> 
> Well, I can find a silver lining here, because it gives me an even
> better reason than I had to pitch an idea I've been kicking around for a
> while.  Why not enhance groff man(7) to support a mode where _it_ will
> spit out the "Name"/"NAME" section, and only that, _for_ you?
> 
> This would be as easy as checking for an option, say '-d EXTRACT=Name',
> and having the package's "TH" and "SH" macro definitions divert
> (literally, with the `di` request) everything _except_ the section of
> interest to a diversion that is then never called/output.  (This is
> similar to an m4 feature known as the "black hole diversion".)

Sounds good.  And then lexgrog(1) would be a one-liner that calls
groff(1) with the appropriate flag, right?

> All of the features necessary to implement this[2] were part of troff as
> far as back as the birth of the man(7) package itself.  It's not clear
> to me why it wasn't done back in the 1980s.

Not enough energy of activation, probably, as with most stuff.

> lexgrog(1) itself will of course have to stay around for years to come,

You can make it a wrapper around groff(1) with flags, no?

> but this could take a significant distraction off of Colin's plate--I
> believe I have seen him grumble about how much *roff syntax he has to
> parse to have the feature be workable, and that's without upstart groff
> maintainers exploring up to every boundary that existed even in 1979 and
> cheerfully exercising their findings in man pages.
> 
> I also of course have ideas for generalizing the feature, so that you
> can request any (sub)section by name, and, with a bit more ambition,[4]
> paragraph tags (`TP`) too.
> 
> So you could do things like:
> 
> nroff -man -d EXTRACT="RETURN VALUE" man3/bsearch.3

I certainly use this.

	#  man_section()  prints specific manual page sections (DESCRIPTION, SYNOPSIS,
	# ...) of all manual pages in a directory (or in a single manual page file).
	# Usage example:  .../man-pages$ man_section man2 SYNOPSIS 'SEE ALSO';

	man_section()
	{
		if [ $# -lt 2 ]; then
			>&2 echo "Usage: ${FUNCNAME[0]} <dir> <section>...";
			return $EX_USAGE;
		fi

		local page="$1";
		shift;
		local sect="$*";

		find "$page" -type f \
		|xargs wc -l \
		|grep -v -e '\b1 ' -e '\btotal\b' \
		|awk '{ print $2 }' \
		|sort \
		|while read -r manpage; do
			(sed -n '/^\.TH/,/^\.SH/{/^\.SH/!p}' <"$manpage";
			 for s in $sect; do
				<"$manpage" \
				sed -n \
					-e "/^\.SH $s/p" \
					-e "/^\.SH $s/,/^\.SH/{/^\.SH/!p}";
			 done;) \
			|mandoc -Tutf8 2>/dev/null \
			|col -pbx;
		done;
	}

	#  man_lsfunc()  prints the name of all C functions declared in the SYNOPSIS
	# of all manual pages in a directory (or in a single manual page file).
	# Each name is printed in a separate line
	# Usage example:  .../man-pages$ man_lsfunc man2;

	man_lsfunc()
	{
		if [ $# -lt 1 ]; then
			>&2 echo "Usage: ${FUNCNAME[0]} <manpage|manNdir>...";
			return $EX_USAGE;
		fi

		for arg in "$@"; do
			man_section "$arg" 'SYNOPSIS';
		done \
		|sed_rm_ccomments \
		|pcregrep -Mn '(?s)^ [\w ]+ \**\w+\([\w\s(,)[\]*]*?(...)?\s*\); *$' \
		|grep '^[0-9]' \
		|sed -E 's/syscall\(SYS_(\w*),?/\1(/' \
		|sed -E 's/^[^(]+ \**(\w+)\(.*/\1/' \
		|uniq;
	}

	#  man_lsvar()  prints the name of all C variables declared in the SYNOPSIS
	# of all manual pages in a directory (or in a single manual page file).
	# Each name is printed in a separate line
	# Usage example:  .../man-pages$ man_lsvar man3;

	man_lsvar()
	{
		if [ $# -lt 1 ]; then
			>&2 echo "Usage: ${FUNCNAME[0]} <manpage|manNdir>...";
			return $EX_USAGE;
		fi

		for arg in "$@"; do
			man_section "$arg" 'SYNOPSIS';
		done \
		|sed_rm_ccomments \
		|pcregrep -Mv '(?s)^ [\w ]+ \**\w+\([\w\s(,)[\]*]+?(...)?\s*\); *$' \
		|pcregrep -Mn \
		  -e '(?s)^ +extern [\w ]+ \**\(\*+[\w ]+\)\([\w\s(,)[\]*]+?\s*\); *$' \
		  -e '^ +extern [\w ]+ \**[\w ]+; *$' \
		|grep '^[0-9]' \
		|grep -v 'typedef' \
		|sed -E 's/^[0-9]+: +extern [^(]+ \**\(\*+(\w* )?(\w+)\)\(.*/\2/' \
		|sed    's/^[0-9]\+: \+extern .* \**\(\w\+\); */\1/' \
		|uniq;
	}

Even grepc(1) derived from those scripts.

> 
> and:
> 
> nroff -man -d EXTRACT="OPTIONS/-b" man8/zic.8

While I haven't used this yet, it's probably because it's quite complex
to implement with regexes, not because it wouldn't be useful.

> 
> ...does this sound appetizing to anyone?

Certainly.

> > Also, many other pages might need to be changed accordingly for
> > consistency.
> 
> I withdraw the suggestion until lexgrog(1) flexes its own muscles, or
> has groff(1) do the lifting.  I'm sorry for prompting churn, Ian.
> 
> > No, this isn't outdated, since that reduces the quality of the diff.
> > Also, I review a lot of patches in the mail client, without running
> > git(1).  And it's not just for reviewing diffs, but also for writing
> > them.  Semantic newlines reduce the amount of work for producing the
> > diffs.
> 
> It's a real win for diffs.

And diffs are a real win for text.  Thus, semantic newlines are a real
win for text.  "Write poems, not prose."  (Any chance we may get that
warning added to groff(1)?  :D)

Cheers,
Alex

> 
> Here's a very recent example from groff.
> 
> diff --git a/man/groff.7.man b/man/groff.7.man
> index 1fb635f2b..1d248b237 100644
> --- a/man/groff.7.man
> +++ b/man/groff.7.man
> @@ -1281,6 +1281,7 @@ .SH Identifiers
>  typeface,
>  color,
>  special character or character class,
> +hyphenation language code,
>  environment,
>  or stream.
>  .
> 
> 
> (So recent that in fact I haven't pushed that yet.)
> 
> Lists like the foregoing are common in man pages.
> 
> Regards,
> Branden
> 
> [1] https://man7.org/linux/man-pages/dir_by_project.html#groff
> [2] String definitions, "string comparisons"[3], and diversions.
> [3] strictly, "formatted output comparisons"
> 
>     https://www.gnu.org/software/groff/manual/groff.html.node/Operators-in-Conditionals.html
> 
>     You can do stricter string comparisons in GNU troff.  And I've
>     thought of some syntactic sugar for performing them that wouldn't
>     break backward compatibility.
> 
> [4] To really land the feature, we need automatic tag generation from
>     input text (we don't want to make the man page author construct
>     their own tags).  Another reason we want the construction to be
>     automatic is to make the tags unique when multiple man pages are
>     formatted in one run, as one might do when making a book of man
>     pages.  Automatic tagging will also enable the slaying of two other
>     ancient dragons.
> 
>     1.  deep internal links for PDF bookmarks
>     2.  pod2man's `IX`-happy output; the widespread use of this
>         nonstandard macro confuses way too many novice page authors, and
>         bloats document size.
> 
>    Another feature we'll really want to do this right is improved string
>    processing facilities.  That, too, is something that will pay
>    dividends in several areas.  With a proper string iterator in the
>    formatter (and a couple more conditional operators),[5] it will be
>    possible to write a string library as a macro file, slimming down the
>    formatter itself a little and making macro writers' lives easier.
>    We're only two days into the month and this has already come up on
>    the groff list.
> 
>    https://lists.gnu.org/archive/html/groff/2024-11/msg00002.html
> 
> [5] https://savannah.gnu.org/bugs/?62264

-- 
<https://www.alejandro-colomar.es/>
Attachment:
signature.asc

Description: PGP signature