Re: PDF book of unreleased pages (was: strncpy clarify result may not be null terminated)

Alejandro Colomar <alx@xxxxxxxxxx> · Mon, 20 Nov 2023 10:43:53 +0100

Hi Branden,

On Sun, Nov 19, 2023 at 06:46:29PM -0600, G. Branden Robinson wrote:
> Hi Alex and Deri,
> 
> I'm going to address just a few small parts of this message...
> 
> At 2023-11-19T21:58:03+0100, Alejandro Colomar wrote:
> > You can always `find ... | xargs cat | troff /dev/stdin`
> 
> ...not if you need to preprocess any of the input.  With tbl(1), for
> instance.

What I mean is that I can preprocess individually:

find ... | while read f; do eqn $f > $f.troff; done

And only process together in a single invocation what _needs_ to be done
in a single invocation:

find ... | xargs cat | gropdf /dev/stdin

I guess that preprocessors can be run per-file.
I know that gropdf(1) must be run with the entire book as input.
But I don't know if `troff -Tpdf` needs to see the entire book at once,
or if it can process each file separately.

In my laptop, the pipeline for building the Linux Man Book takes 23.3 s.
I've split the processing of the book so that I produce every
intermediary file in the pipeline (except pic(1), which I think we don't
need).  From that, I've seen the times it takes for each program to do
its job (and importantly, the overall time wasn't slower; it took again
23.3 s): preconv(1) takes 0.04 s; tbl(1) takes 0.06 s; eqn(1) takes
0.05 s; troff(1) takes 2.8 s; and gropdf(1) takes 17.6 s.

The time taken by gropdf(1) is mandatory, since it can't process the
individual files separately.  But if we can reduce the time taken by all
other programs close to 0, it would be good.  It depends on which
programs need to see the entire book, and which can process each file
separately.

Nevertheless, I think it's interesting to process the book per-file, as
much as possible, even if the overall time won't change significantly.
It is a good documentation of what needs to be processed together and
what not, when building a PDF document with groff.

> > My problem is probably that I don't know what's done by `gropdf`, and
> > what's done by `troff -Tpdf`.  I was hoping that `troff -Tpdf` still
> > didn't need to know about the entire book, and that only gropdf(1)
> > would need that.
> 
> This stuff is documented in groff's Texinfo manual, and in the groff(1)
> and roff(7) man pages.
> 
> Here's an excerpt of the last.
> 
> Using roff
>        When you read a man page, often a roff is the program rendering
>        it.  Some roff implementations provide wrapper programs that make
>        it easy to use the roff system from the shell’s command line.
>        These can be specific to a macro package, like mmroff(1), or more
>        general.  groff(1) provides command‐line options sparing the user
>        from constructing the long, order‐dependent pipelines familiar to
>        AT&T troff users.  Further, a heuristic program, grog(1), is
>        available to infer from a document’s contents which groff
>        arguments should be used to process it.
> 
>    The roff pipeline
>        A typical roff document is prepared by running one or more
>        processors in series, followed by a a formatter program and then
>        an output driver (or “device postprocessor”).  Commonly, these
>        programs are structured into a pipeline; that is, each is run in
>        sequence such that the output of one is taken as the input to the
>        next, without passing through secondary storage.  (On non‐Unix
>        systems, pipelines may have to be simulated with temporary
>        files.)
> 
>         $ preproc1 < input‐file | preproc2 | ... | troff [option] ... \
>             | output‐driver
> 
>        Once all preprocessors have run, they deliver pure roff language
>        input to the formatter, which in turn generates a document in a
>        page description language that is then interpreted by a
>        postprocessor for viewing, printing, or further processing.
> 
> gropdf(1) is the output driver for the PDF "device".  So "groff -T pdf
> input.tr" and "troff -T pdf input.tr | gropdf" are equivalent.
> 
> (Yes, you still need the `-T pdf` arguments, even to troff proper.

This doesn't answer my doubt.  For generating a book, does troff(1) need
to see the entire book, or it enough if gropdf(1) does?  My guess is
that troff(1) also needs to see the entire book, but I don't know for
sure.

Cheers,
Alex

-- 
<https://www.alejandro-colomar.es/>
Attachment:
signature.asc

Description: PGP signature