Ingo Schwarze writes:
Hi San, Sam Varshavchik wrote on Sun, Aug 14, 2022 at 08:20:34PM -0400: > Ingo Schwarze writes: >> DJ Chase wrote on Sat, Aug 13, 2022 at 05:27:34PM +0000: >>> Have we ever considered a de jure *roff standard? >> No, i think that would be pure madness given the amount of working >> time available in any of the roff projects. > I tinkered with something like this some years ago, but I took a slightly > different approach. > > I converted man pages What kind of manual pages?
The ones that are the subject of discussions on linux-man@xxxxxxxxxxxxxxx.
> from 'roff source to Docbook XML using a … pretty large Perl script. That sounds very foolish on several levels.
Well, I had some free time the other day, and had nothing better to do.
First, and most obviously, you seem to be duplicating esr@'s work on doclifter: http://www.catb.org/~esr/doclifter/ https://gitlab.com/esr/doclifter/-/blob/master/doclifter
Seems so, except that I tailored my logic to man pages, and specifically to the linux-man@xxxxxxxxxxxxxxx manpages.
Second, quick and dirty Perl-style parsing is usually not good enough to parse roff code, and a huge script is not particularly good for readability and maintainability.
Yes, arbitrary roff code will not fly very far. But something that's tailored can produce productive results.
Yes, i know the same resevations would apply to esr@'s work, which is a giant Python 3 script. But at least there is some evidence that his work was able to find significant numbers of real issues in real manual pages.
Yes, there are plenty of issues there. I fed quite a few patches to Mr. Kerrisk when he maintained them, based on my scripts chewing through them. There were plenty of mismatched .nf/.fi, and other things of that sort.
> Once a year, or so, when I have nothing better to do I pull the current > man page tarball and reconvert it. I usually need to tinker the Perl > script, here and there, each time. > > The Docbook folks provide a stylesheet that converts Docbook XML > back to 'roff. Yikes. That thing is by far the worst man(7) code generator existing on this planet. If at all possible, you should avoid that toolchain like the plague.
I do not view it as an authoritative source of man sources, but more of backwards compatibility. I believe that for man pages, roff should've been replaced by Docbook XML a long time ago.
That was really the original impetus for my Perl hacking: to see how feasible it would be to convert the existing man pages to Docbook XML. My end result showed that at least that it was doable; and I think that the Docbook XML stylesheet for man pages would've been an acceptable way to get some roff source generated from Docbook XML that's shown by the man command.
> The end result you get is standardized 'roff, whatever that means. Absolutely not. The result is utter crap. It is rarely even syntactically valid, let alone reasonable style.
I should've used "consistent" instead of "standardized". Different man pages from different sources use different ways of rendering the same content, i.e. function names. Sometimes it's in bold. Sometimes it's in italic. Sometimes it's something else. With consistent semantic markup a <function> in every man page would've produced the same markup in the generated roff source.
Attachment:
pgpOuvvFW9Dt7.pgp
Description: PGP signature