Re: Linux man-pages Makefile portability

Alejandro Colomar <alx.manpages@xxxxxxxxx> · Sun, 3 Jul 2022 23:44:51 +0200

[added Branden, as he was involved in discussions regarding man3type;
Branden, you might want to visit this thread from the begining, as I 
only copied the minimum to reply; it's in linux-man@]

Hi, Ingo! and Branden!

On 6/20/22 15:49, Ingo Schwarze wrote:

But that Makefile was clearly unused since no-one knows.

I'm not saying "noone knows"; some packaging tools on some Linux
distributions might very well rely on the Makefile - i don't know.

Oh, Michael seemed a bit surprised that I started patching the Makefile, 
as if the features I was patching hadn't been used in a very long time.

Anyway, I fixed `make all`.  I never liked it.  Now it builds all that 
it can build, which now is HTML pages, and in the future will probably 
include PDF pages too.  It was that, or make it a no-op, so I thought 
HTML+PDF was more useful.

All i'm saying is i don't readily see a reason why people *not*
running Linux might need the Makefile...

Well, the Makefile is basically meant to install (copy) the files to the 
system, so since you copy them to somewhere, `gmake install mandir=...` 
should work, but cp(1) -r tends to be as useful for such a simple package.

If you know a better tool, I could start using it.  Maybe I could use
groff(1) directly, with grohtml(1).

I don't think using groff to generate HTML output from manual pages
is a good idea.  It generates low-quality HTML code, and it is
impossible to fix because of the basic concept how groff works.

[...]

Hmm, I'll try both and see.  Thanks.

[...]
I guess it's due to the use of $(foreach ).  I guess it's a GNU
extension and make(1POSIX) ignores that creating an empty string.  Since
the Makefile uses a lot of functions[1], I guess it's not easy to make
it portable.
[...]
[1]:  addsuffix, wildcard, foreach, filter, patsubst, sort, shell,
basename, notdir, info.  Not sure how many of those are supported by
your make(1); maybe none?

The OpenBSD implementation of make(1) is much more powerful than
POSIX make, but according to the manual page, you are right that
none of these keywords are supported by OpenBSD make, let alone by
POSIX make.

Heh, then to be compatible with BSD make(1) I guess I'd have to hardcode 
the page names, or use suffix rules, none of which convinces me.

Maybe suffix rules could work, but I'd have to stop testing the 
EXAMPLES, which is the most complex part of the Makefile.

I'll consider using them, or at least Substitution References[1] if it 
adds compatibility, at least for the simplest tasks, such as `make install`.

[1]: 
<https://www.gnu.org/software/make/manual/html_node/Substitution-Refs.html#Substitution-Refs>

No, you didn't.  I expected autocomplete to help,

I almost never use autocomplete except for the names of commands that are
installed system-wide and for the names of files in the local file system.

Oh, I use autocomplete everyday for things like arguments to git 
commands, and I really feel it when I'm in a system where I don't have 
such help.

I didn't even know it is possible to use autocompletion for make targets,
and i dislike the idea.  But don't worry!  Your build system *is*
complicated for a package that actually doesn't need to build anything,
but not so bad that i didn't find what i looked for.  :)

:)

As a side remark, i consider it bad style to use dependencies during
installation: dependencies are for the build stage, not for the
installation stage.  When i say "make install", i just want *all*
the files installed unconditionally for two reasons: On the one hand,
dependency handling is error-prone and it would be bad if some file
does not get installed due to the notorious problem of oversights in
dependency handling (and dependency handling in parallel Makefiles
is even more fragile than in serial ones).  On the other hand, "make
install" also has the purpose of repairing an installation that got
broken in some way or other, and skipping some files because the
build system *thinks* they are probably still installed properly
defeats the purpose IMHO.

I've had doubts about that, and in the past I tended to do the same as 
you suggest, not because of fear to broken deps, but for making sure I 
don't create temporary files owned by root.  But in this case, where 
there are thousands of files to install, there's an important time 
difference between installing just the diff and installing the whole 
repo, so I asked the following question[2] just to confirm my doubts, 
and added the deps.

Regarding the possibility of broken deps, I believe the solution is to 
fix the Makefile, not to assume that it can't be done right and make it 
dumb; and I try hard to make sure my Makefiles work in multi-process 
mode.  There's always a chance that I got some corner case wrong, but 
this case it's pretty low (and if someone doesn't trust my Makefiles to 
behave well with -j, I don't force to use it, but I recommend it very 
much :)).

Regarding `make install` having a secondary purpose of being kind of a 
reinstall, I disagree.  I tend to write an explicit `make reinstall` 
target for that purpose (implemented as `\t$(MAKE) uninstall\n\t$(MAKE) 
install`); I didn't write it yet for the man-pages, but I'm going to add 
it now.

[2]: 
<https://stackoverflow.com/questions/70901364/should-make-install-depend-on-compilation>

That's a lot, but it has
it's advantages (generating the file list on-the-fly; no ./configure).

Then, the actual installation of the ~2.5k pages (most of them are link
pages),

As another aside, i consider using .so bad style.  It is unnecessarily
fragile.  Using hard links on the file system level (see ln(1))
is significantly more robust.  With mandoc(1), you don't need links
at all, but i admit traditional man(1) implementations including
man-db still require them for manual pages having more than one name.

I also had that feeling at first.  I just leave it there because of 
"don't fix it if it ain't broke" and it just works.  .so has a good 
side, which is that the Makefile is simpler, as it doesn't need to 
create links.

takes another 1.4 s in multi-process mode, and 6 s in
single-process mode (so at least 4.6 s that are not I/O).  Maybe it's
make(1) that has a hard time traversing the tree... I don't know where
the bottleneck is, but it's clearly there.

I see.  So you need multiple processors purely for dealing with make(1)
overhead...  Gee!  :-/

Yupee!  :/

I'll see if I can reduce that overhead without losing features.  Maybe I 
improve compatibility in the way.  :)

[...]
BTW, did you check the changes to queue.3?  I guess you could improve
yours in a similar manner.

<https://linux-man-pages.blogspot.com/2020/11/man-pages-509-is-released.html>

I think in OpenBSD, these changes would get vetoed by large numbers
of developers because they violate the way OpenBSD manual pages are
organized in several ways:

  1. Your queue(7) manual page is placed in the wrong section.
     It is purely about an API provided by a library for the C language.
     Such information unambiguously belongs in section 3 and certainly
     not in section 7.  It is not even an edge case; it is perfectly
     clear what the correct section is.

See 2.

  2. Your file names and .TH names violate the OpenBSD convention
     that section 2 and 3 manual pages must be named after functions
     or macros.  For example, the page name "slist" is not acceptable
     because no sname() function or macro exists.

Heh, I agree!  I would have put them in section 7, but I was new to the 
project, and didn't want to change things too much at the time.  Since 
queue(3) was in man3, I kept the tradition, and the child pages were 
kept in man3.  Probably I should have put them in man7, but blame 
history, not me :)

Buuut, is it me, or I see a contradiction with point 1, which claims 
that queue(3) should be in man3?  We don't have a slist() 
function/macro, but we don't have a queue() one either (maybe 
historically there was one and I don't know it, but I guess not.  My 
systems say:

alx@devuan:/usr/include$ grepc queue
alx@devuan:/usr/include$

alx@debian:/usr/include$ grepc queue
alx@debian:/usr/include$

Should now queue() be in man3 or man7?

  3. Splitting the page up into multiple pages is a bad idea for
     two reasons: it results in significant duplication of information
     and it splits information about interfaces so closely related
     to each other that most of their features are identical across
     multiple pages.

Actually, I didn't duplicate information at all, AFAIR.  It was already 
_very_ duplicated in the same queue(3) page, so I just splitted it at 
the right points.  I only had to cut the page into many little ones, 
then translate the pages from mdoc(7) to man(7), and then fix minor 
style issues.

See:

$ wc -l man3/circleq.3 man3/list.3 man3/slist.3 man3/stailq.3 
man3/tailq.3 man7/queue.7
  318 man3/circleq.3
  306 man3/list.3
  317 man3/slist.3
  375 man3/stailq.3
  395 man3/tailq.3
  133 man7/queue.7
 1844 total
$ git checkout man-pages-5.08 >/dev/null 2>&1
$ wc -l man3/queue.3
1231 man3/queue.3

The difference is just source code overhead; the text is almost the same.

Maybe you could still simplify your queue(7) page in a different way, 
without splitting it; it is very repetitive.

You would have no chance of getting anything like that committed to
OpenBSD.

Heh, I know.  The only thing that was well received from my side in that 
list was a bug report about some exec(3) function (about alloca(3)).

Also, if you have been following the addition of pages about types, and
would like to comment, you'll be welcome!

<https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/commit/?id=178eaf37e2e971cae88bd4d3f124ede0afbb1015>

BSD doesn't have manual pages about types, and i don't think there is a
significant benefit from having them.  Most standard types are trivial
and can easily be looked up in the header files with no need for separate
documentation.

Oh, yes, I envy your headers.  They are really readable!
But glibc source code is not as friendly, and it took me a long long 
time to get used to search things in those headers (I still can't find 
some things in their code; function definitions, for example, are very 
cryptic in some cases).

Also, some programmers, especially when starting (but I know of many 
programmers that are "senior" and still have serious issues with types), 
would benefit from documentation specific about types.  That would help 
understand their limitations, and what a type is appropriate for or not.

 In the unusual case that a type has non-trivial syntax
and/or semantics, it can be documented in the manual page of the most
closely related API function; for example, "struct pollfd" is documented
in our poll(2) page.

That makes it easy to find with the usual
"man -k Vt=typename" search command.

Oh, that's where man(7) sucks :)

I wouldn't need these:

$ sed -n /^man_lsfunc/,/^}/p <scripts/bash_aliases
man_lsfunc()
{
	if [ $# -lt 1 ]; then
		>&2 echo "Usage: ${FUNCNAME[0]} <manpage|manNdir>...";
		return $EX_USAGE;
	fi

	for arg in "$@"; do
		man_section "$arg" 'SYNOPSIS';
	done \
	|sed_rm_ccomments \
	|pcregrep -Mn '(?s)^ [\w ]+ \**\w+\([\w\s(,)[\]*]*?(...)?\s*\); *$' \
	|grep '^[0-9]' \
	|sed -E 's/syscall\(SYS_(\w*),?/\1(/' \
	|sed -E 's/^[^(]+ \**(\w+)\(.*/\1/' \
	|uniq;
}
$ sed -n /^man_lsvar/,/^}/p <scripts/bash_aliases
man_lsvar()
{
	if [ $# -lt 1 ]; then
		>&2 echo "Usage: ${FUNCNAME[0]} <manpage|manNdir>...";
		return $EX_USAGE;
	fi

	for arg in "$@"; do
		man_section "$arg" 'SYNOPSIS';
	done \
	|sed_rm_ccomments \
	|pcregrep -Mv '(?s)^ [\w ]+ \**\w+\([\w\s(,)[\]*]+?(...)?\s*\); *$' \
	|pcregrep -Mn \
	  -e '(?s)^ +extern [\w ]+ \**\(\*+[\w ]+\)\([\w\s(,)[\]*]+?\s*\); *$' \
	  -e '^ +extern [\w ]+ \**[\w ]+; *$' \
	|grep '^[0-9]' \
	|grep -v 'typedef' \
	|sed -E 's/^[0-9]+: +extern [^(]+ \**\(\*+(\w* )?(\w+)\)\(.*/\2/' \
	|sed    's/^[0-9]\+: \+extern .* \**\(\w\+\); */\1/' \
	|uniq;
}

And they're not perfect...

Documenting a non-trivial type separately from the functions using it
is counter-productive and makes the documentation hard to read because
programmers *never* need to use a type defined by a library unless they
also want to use related API functions.

That's not true.  I've needed (or better phrased, wanted) types, even 
when I wasn't using any APIs that used them.  The reason was that I was 
designing an API, and wanted to use the most appropriate types for my 
functions.

Having had documentation about types would have helped *a lot* at the time.

Many programmers don't know all the differences between size_t and 
ssize_t, and for example that ISO C only provides one of them (the other 
is added by POSIX), and I know of one *great* programmer that learnt the 
difference between those types from me a few months ago :).

One could search in the standard documents about the types, but I guess 
we will agree that those documents are not very friendly, especially for 
beginners.

Another case where I've found my type pages very useful was when I 
contributed[3] to iwyu(1)[4].  Having to read the POSIX or ISO C 
documents would have been crazy (I had to do it anyway, to write the 
pages, but I don't want to repeat that process again ;)).

[3]: <https://github.com/include-what-you-use/include-what-you-use/pull/930>
[4]: <https://include-what-you-use.org/>

 Actually, i find it better
to *not* add type names as names of manual pages because that way,
the classical syntax "man functname" can be used to search for
function and macro names and the advanced syntax "man -k Vt=typename"
to search for types, with less potential for confusion.

We don't have that feature in man(7), so the closest thing that I do is 
to grep in the glibc and BSDs source code with grepc(1)[3], and also 
`grep -rn ...` inside the man-pages repo.

[5]: <http://www.alejandro-colomar.es/src/alx/alx/grepc.git/>

So in OpenBSD, your pages about types would get vetoed on the grounds
of "pages not named after functions or macros" as well as on the
grounds of "these pages do not document any function or macro;
instead of creating a new page, put the information where it belongs."

The first argument, I agree, and it's why I didn't use section 3, but 
subsection 3type.

The second, I disagree for the reasons above, but can understand why 
others might disagree with me.  Maybe I can convince you :)

That said, other projects are of course free to have such pages if
they want to.  The mandoc(1) program is also able to handle paths like
"man3/id_t.3type".  It will consider that page to be *both* in section
"3" (as specified by the directory name) and in section "3type" (as
specified by the file name and by the .TH macro).  I would consider
it better style to keep section names consistent, i.e. to use either
"man3/id_t.3" .TH id_t 3 or "man3type/id_t.3type" .TH id_t 3type,
but it's not a big deal: since many systems (in particular various
Linux distros) suffer from such inconsistencies, handling such
inconsistencies gracefully is an important feature that certainly
won't get removed.

I considered[6] using man3type, and used man3 in the end just because 
when in doubt I opted for the smallest change.  Knowing that it breaks 
mandoc(1), I'll definitely move to <man3type/>.

[6]: 
<https://lore.kernel.org/linux-man/761bb12f-31e0-369d-8315-d2e1545505c7@xxxxxxxxx/T/#u>

  Commands like

     $ man -M /co/man-pages open

work perfectly fine on my system to view the Linux open(2) manual,
nicely formatted, with no need for installation or a Makefile.
Even when i put up a copy at

    https://man.bsd.lv/Linux-5.13/open

Yes, since there's no compilation, `make install` is basically a wrapper 
around `cp -r`.

It has nice features, such as reduced install time by checking 
timestamps, but that's more useful to me as a maintainer (since I 
install several times a minute in some cases), and not so much for end 
users, where a few seconds are not important.

How do you generate your HTML pages?  mandoc(1)?  They are nice.

https://man.openbsd.org/mandoc.1#HTML_Output
https://man.openbsd.org/man.cgi.8

I think that's the usual way to generate HTML from manual pages
nowadays.  The following sites also use mandoc for HTML output:

  * https://www.freebsd.org/cgi/man.cgi
  * https://manpages.debian.org/
  * https://man.archlinux.org/
  * https://man.voidlinux.org/

Some of these have their own CGI handling and/or database code,
but they all use the mandoc parser and HTML generator.

Interesting.  Thanks!

Cheers,

Alex

--
Alejandro Colomar
<http://www.alejandro-colomar.es/>
Attachment:
OpenPGP_signature

Description: OpenPGP digital signature