Man page titles, identifers, capitalization, and hyphens therein

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Alex,

At 2025-01-14T14:19:49+0100, Alejandro Colomar wrote:
> > @@ -95,8 +95,8 @@ .SS Title line
> >  The arguments of the command are as follows:
> >  .TP
> >  .I title
> > -The title of the man page, written in all caps (e.g.,
> > -.IR MAN-PAGES ).
> > +The title of the man page, written in lowercase (e.g.,
> > +.IR man-pages ).
> 
> Actually,

To try to bring order to the chaos and confusion surrounding this
subject, I use the term "identifier".

groff_man(7):
   Document structure macros
     Document structure macros organize a man page’s content.  All of
     them break the output line.  .TH (title heading) identifies the
     document as a man page and configures the page headers and footers.
...
     .TH identifier section [footer‐middle [footer‐inside [header‐
     middle]]]
            Populate the page header and footer.  Together, identifier
            and the section of the manual to which it belongs can
            uniquely identify a man document on the system.  See man(1)
            or intro(1) for the manual sectioning applicable to your
            system.  identifier and section are positioned at the left
            and right in the header; the latter is set after the former,
            in parentheses and without space.
...
     .SH [heading‐text]
            Set heading‐text as a section heading.
...
            The content of heading‐text and ordering of sections follows
            a set of common practices, as does much of the layout of
            material within sections.  For example, a section called
            “Name” or “NAME” must exist, must be the first section after
            the .TH call, and must contain only text of the form
                   topic[, another‐topic]... \- summary‐description
            for tools like makewhatis(8) or mandb(8) to index them.

groff_man_style(7):
     • What’s the difference between a man page topic and identifier?

       A single man page may document several related but distinct
       topics.  For example, printf(3) and fprintf(3) are often
       presented together.  Moreover, multiple programming languages
       have functions named “printf”, and may document these in a man
       page.  The identifier is intended to (with the section) uniquely
       identify a page on the system; it may furthermore correspond
       closely to the file name of the document.

       The man(1) librarian makes access to man pages convenient by
       resolving topics to man page identifiers.  Thus, you can type
       “man fprintf”, and other pages can refer to it, without knowing
       whether the installed document uses “printf”, “fprintf”, or even
       “c_printf” as an identifier.

> the title should follow the name of the page.

I don't understand how the "name" is distinct from the "title" in your
usage.

> Usually, this is lowercase, but in some cases

It can certainly be mixed case; X11-related man pages have been around
for longer than many Linux users have been alive.

XmCreatePushButtonGadget (3) - The PushButtonGadget creation function ...
XtVaAppInitialize (3) - initialize, open, or close a display

> it should be sentence case,

I wouldn't apply that term here.  A man page identifier (the first
argument to `TH`) will not be a sentence. Nor will comprise multiple
words separated by spaces.  Not because it strictly could not, but
because it would be impractical to do so, and might expose bugs in man
page indexers like makewhatis(8) and mandb(8).

> or upper case,

I advise this only when the identifier would be shouted in other
contexts, like X(7).

> if the name is something like UTF-8,

(by which you mean "uses code points outside the Basic Latin range")

...that's _also_ going to put a heavier load on indexers.  Also, due to
the possibility of homoglyph attacks and the sheer cussedness of
inputting non-ASCII characters in some environments, especially when one
has to bring up a man page from a machine other than one's own
tricked-out, optimally configured, liquid-cooled Genesis Device, I'd
avoid using, in man page identifiers, UTF-8 code points requiring more
than one byte to encode.

> So, I would instead just remove the ", written in ..." part.

Speaking of UTF-8, favoring underscores over hyphens in man page
identifiers that aren't command names or C identifiers may be a good
idea because this is yet another site of ambiguity created by its
embrace some years ago; at long last the minus sign and hyphen were
de-unified in everyday shell usage, and it's frequently not obvious to
the ignorant and swift to anger which character they mean when they
strike the '-' key on their keyboard.

So while the '-' in, say, "xdg-open" should definitely be a hyphen-minus
when appearing in a man(7) `TH` macro call (thus keyed in as `\-`),
because the hyphen-minus is part of its _name_ as installed on the
system, I see the following pages in /usr/share/man/man7 on my system,
where the hyphen appears to be present to make contiguous a noun phrase
that in prose would be written with word spaces.

	file-hierarchy.7
	frontend-spec.7
	gitcore-tutorial.7
	man-pages.7		;-)
	nmcli-examples.7
	signal-safety.7

So, considered as text, are these hyphens, or are these hyphen-minuses?

It depends on how you look at the problem.  When writing prose, they're
neither--we would use a space instead.  But as noted above, a space
isn't going to play nicely with common techniques for manipulating file
names on POSIX systems, and man page indexers might choke on them.  So
we instead employ a visible character to separate the words.

One of the causes of hyphen-minus grievance in our epic struggles of the
past 15 years or so in this area is the fact that people (reasonably)
want to to copy text from man pages and paste it into shell scripts on
the command line.  Where the hyphen-minus has a semantic identity as
such, as in command or file names or C language expressions, we should
definitely be encoding it such that it renders _as_ a hyphen-minus.

When what we're copying has no symbolic representation in a programming
language or the file system, the hyphen-minus loses its reason for
existence (except when discussing the crater that the compromises made
by the ANSI committee behind X3.4 left in our storage and communications
technologies).

(Arguing that all of the foregoing *.7 examples are instances of file
names reverses cause and effect.  They have the file names they do
because someone _chose_ man page identifiers for them that included
hyphens.)

There's long-standing precedent for throwing one's hands up at the
inherent slipperiness of the hyphen-minus character.  When referring to
ISO 8859 character encoding standards, the parts of the standard have
numeric identifiers.  The Linux man-pages project, for example, installs
a page named

	iso_8859-1.7

but provides a symbolic link

	iso_8859_1.7

to do what the baffled user means when they guess wrong.  Of course the
ISO committee didn't give its standard an identifier using either of
those characters.  They used a _typographical_ hyphen, and left its
representation on file systems as a problem for others to solve.

I see no reason to harangue anyone into renaming their hyphenated
chapter 7 man pages to use underscores instead of hyphen-minuses; I am
simply pointing out that whether to say, for example:

	.TH man-pages 7 2025-01-14 "Linux man-pages 6.9.1"

or

	.TH man\-pages 7 2025-01-14 "Linux man-pages 6.9.1"

...is a difficult question to decide "correctly".

The difficulty is compounded by bros who shriek like toddlers -- knowing
nothing of typography or the histories of character encoding standards,
the challenges faced by the groups developing them, or their
implementations in operating systems and applications -- and insist that
only one symbol is under discussion here, and it's whatever shows up on
the screen when they mash a "-" key with a saliva-saturated finger.[1]

The solutions we've developed for these problems are designed _so that_
one doesn't have to possess such specialized knowledge to access the man
pages one wants to read.  More people read man pages than write them.
The burden shifts to the man page author, who occasionally, in man(7),
has to type a `\` before a `-` instead of bashing out text in a single
draft and never looking back as they would in Markdown.  When one
undertakes to instruct others, as the process of man page authorship
implies, one must be humble enough to also engage in the process of
_learning_.  I admit: that's not the Cowboy Way.  Not the Rock Star Way.

Getting back to the capitalization issue, the rendering aspect of it is,
of course, configurable in groff.

>On 10/30/22 23:00, G. Branden Robinson wrote:
>> For those to whom this change is coming as an unpleasant surprise,
>> the forthcoming groff 1.23.0

Now released for over a year and a half.

>> features an option that will reverse this change at rendering time.
>>
>>  From groff_man(7):
>>
>>     -rCT=1 Capitalize titles, setting the man page title (the first
>>            argument to .TH) in full capitals in headers and footers.
>>            This transformation is off by default because it discards
>>            case distinction information.
>>
>> This register can also be set in a site-local "man.local" file to
>> force it on for all pages.  On Debian-based systems, this file is in
>> /etc/groff.  The following line will do the trick.
>>
>> .nr CT 1
>>
>> The groff_man_style(7) man page offers further examples of such
>> rendering customization.

Regards,
Branden

[1] https://lwn.net/Articles/947941/

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux