On Sun, Nov 19, 2023 at 04:21:45PM +0000, Deri wrote: > > $ touch man2/membarrier.2 > > $ make build-pdf > > PRECONV .tmp/man/man2/membarrier.2.tbl > > TBL .tmp/man/man2/membarrier.2.eqn > > EQN .tmp/man/man2/membarrier.2.pdf.troff > > TROFF .tmp/man/man2/membarrier.2.pdf.set > > GROPDF .tmp/man/man2/membarrier.2.pdf > > > > That helps debug the pipeline, and also learn about it. > > > > If that helps parallelize some tasks, then that'll be welcome. > > Hi Alex, Hi Deri, > Doing it that way actually stops the jobs being run in parallel! Each step Hmm, kind of makes sense. > completes before the next step starts, whereas if you let groff build the > pipeline all the processes are run in parallel. Using separate steps may be > desirable for "understanding every little step of the groff pipeline", (and Still a useful thing for our build system. > may aid debugging an issue), but once such knowledge is obtained it is > probably better to leave the pipelining to groff, in a production environment. Unless performance is really a problem, I prefer the understanding and debugging aid. It'll help not only me, but others who see the project and would like to learn how all this magic works. > > > The time saved would be absolutely minimal. It is obvious that to produce > > > a > > > pdf containing all the man pages then all the man pages have to be > > > consumed by groff, not just the page which has changed. > > > > But do you need to run the entire pipeline, or can you reuse most of it? > > I can process in parallel much faster, with `make -jN ...`. I guess > > the .pdf.troff files can be reused; maybe even the .pdf.set ones? > > > > Could you change the script at least to produce intermediary files as in > > the pipeline shown above? As many as possible would be excellent. > > Perhaps it would help if I explain the stages of my script. First a look at > what the script needs to do to produce a pdf of all man pages. There are too > many files to produce a single command line with all the filenames of each > man, groff has no mechanism for passing a list of filenames, so first job is You can always `find ... | xargs cat | troff /dev/stdin` > to concatenate all the separate files into one input file for groff. And while > we are doing that, add the "magic sauce" which makes all the pdf links in the > book and sorts out the aliases which point to another man page. Yep, I think I partially understood that part of the script today. It's what this `... | LC_ALL=C grep '^\\. *ds' |` pipeline produces and passes to groff, right? > After this is done there is a single troff file, called LMB.man, which is the That's what's currently called LinuxManBook.Z, right? > file groff is going to process. In the script you should see something like > this:- > > my $temp='LMB.man'; I don't. Maybe you have a slightly different version of it? > [...] > > my $format='pdf'; > my $paper=$fpaper ||'; > my $cmdstring="-T$format -k -pet -M. -F. -mandoc -manmark -dpaper=$paper -P- > p$paper -rC1 -rCHECKSTYLE=3"; > my $front='LMBfront.t'; > my $frontdit='LMBfront.set'; > my $mandit='LinuxManBook.set'; > my $book="LinuxManBook.$format"; > > system("groff -T$format -dpaper=$paper -P-p$paper -ms $front -Z > $frontdit"); This creates the front page .set file > system("groff -z -dPDF.EXPORT=1 -dLABEL.REFS=1 $temp $cmdstring 2>&1 | > LC_ALL=C grep '^\\. *ds' | This creates the bookmarks, right? > groff -T$format $cmdstring - $temp -Z > $mandit"); And this is the main .set file. > system("./gro$format -F.:/usr/share/groff/current/font $frontdit $mandit - > p$paper > $book"); And finally we have the book. > > (This includes changes by Brian Inglish ts). If you remove the lines which > call system you will end up with just the single file LMB.man (in about a > quarter of a second). You can treat this file just the same as your single > page example if you want to. > > The first system call creates the title page from the troff source file > LMBfront.t and produces LMBfront.set, this can be added to your makefile as an > entirely separate rule depending on whether the .set file needs to be built. > > The second and third system calls are the calls to groff which could be put > into your makefile or split into separate stages to avoid parallelism. > > The second system call produces LinuxManBook.set and the third system combines > this with LMBfront.set to produce the pdf. > > The "./" in the third system call is because I gave you a pre-release gropdf, > you may be using the released 1.23.0 gropdf now. > > > > On my system this takes about 18 > > > seconds to produce the 2800+ pages of the book. Of this, a quarter of a > > > second is consumed by the "magic" part of the script, the rest of the 18 > > > seconds is consumed by calls to groff and gropdf. > > > > But how much of that work needs to be on a single process? I bought a > > new CPU with 24 cores. Gotta use them all :D > > I realise you are having difficulty in letting go of your idea of re-using > previous work, rather than starting afresh each time. Imagine a single word > change in one man page causes it to grow from 2 pages to 3, so all links to > pages after this changed entry would be one page adrift. This is why very > little previous work is useful, and why the whole book has to be dealt with as > a single process. Does such a change need re-running troff(1)? Or is gropdf(1) enough? If troff(1) My problem is probably that I don't know what's done by `gropdf`, and what's done by `troff -Tpdf`. I was hoping that `troff -Tpdf` still didn't need to know about the entire book, and that only gropdf(1) would need that. > If each entry was processed separately, as you would like to > use all your shiny new cores, how would the process dealing with accept(2) > know which page socket(2) would be on when it adds it as a link in the text. I > hope you can see that at some point it has to be treated as a homogenous whole > in order calculate correct links between entries. > > > > So any splitting of the perl script is > > > only going to have an effect on the quarter of a second! > > > > > > I don't understand why the perl script can't be included in your make file > > > as part of build-pdf target. > > > > It can. I just prefer to be strict about the Makefile having "one rule > > per each file", while currently the script generates 4 files (T, two > > .Z's, and the .pdf). > > Explained how to separate above so that the script only generates LMB.man and > the system calls moved to the makefile. Thanks! > > > Presumably it would be dependent on running after > > > the scripts which add the revision label and date to each man page. > > > > I only set the revision and date on dist tarballs. For the git HEAD > > book, I'd keep the (unreleased) version and (date). So, no worries > > there. > > Given that you seem to intend to offer these interim books as a download, it > would make sense if they included either a date or git commit ID to > differenciate them, if someone queries something it would be useful to know > exactly what they were looking at. The books for releases are available at <https://www.alejandro-colomar.es/share/dist/man-pages/6/6.05/6.05.01/man-pages-6.05.01.pdf> (replace the version numbers for other versions, or navigate the dirs) I need to document that in the README of the project. For git HEAD, I plan to have something like <https://www.alejandro-colomar.es/share/dist/man-pages/git/man-pages-HEAD.pdf> It's mainly intended for easily checking what git HEAD looks like, and discard that later. If the audience asks for version numbers, though, I could create provide `git --describe` versions and dates in the pages. > Cheers > > Deri > > > > > Since I don't understand Perl, and don't know much of gropdf(1) either, > > > > I need help. > > > > > > > > Maybe Deri or Branden can help with that. If anyone else understands it > > > > and can also help, that's very welcome too! > > > > > > You are probably better placed to add the necessaries to your makefile. > > > You > > > would then just need to remember to make build-pdf any time you alter one > > > of the source man pages. Since you are manually running my script to > > > produce the pdf, it should not be difficult to automate it in a makefile. > > > > > > > Then I could install a hook in my server that runs > > > > > > > > $ make build-pdf docdir=/srv/www/... > > > > > > And wait 18s each time the hook is actioned!! Or, set the build to place > > > the generated pdf somewhere in /srv/www/... and include the build in your > > > normal workflow when a man page is changed. > > > > Hmm. I still hope some of it can be parallelized, but 18s could be > > reasonable, if the server does that in the background after pushing. > > My old raspberry pi would burn, but the new computer should handle that > > just fine. > > I'm confused. The 18s is how long it takes to generate the book, so if the > book is built in response to an access to a particular url the http server > can't start "pushing" for the 18s, then addon the transfer time for the pdf > and I suspect you will have a lot of aborted transfers. Additionally, the > script, and any makefile equivalent you write, is not designed for concurrent > invocation, so if two people visit the same url within the 18 second window > neither user will receive a valid pdf. No, my intention is that whenever I `git push` via SSH, the receiving server runs `make build-book-pdf` after receiving the changes. That is run after the git SSH connection has closed, so I wouldn't notice. HTTP connections wouldn't trigger anything in my server, except Nginx serving the file, of course. > I advise the build becomes part of your workflow after making changes, and > then place the pdf in a location where it can be served by the http server. > > Your model of slicing and dicing man pages to be processed individually is > doable using a website to serve the individual pages, see:- > > http://chuzzlewit.co.uk/WebManPDF.pl/man:/2/accept > > This is running on a 1" cube no more powerful than a raspberry pi 3. The > difference is that the "magic sauce" added to each man page sets the links to > external http calls back to itself to produce another man page, rather than > internal links to another part of the pdf. You can get an index of all the man > pages, on the (very old) system, here. > > http://chuzzlewit.co.uk/ Yep, I've seen that server :) Long term I also intend to provide one-page PDFs and HTML files of the pages. Although I prefer pre-generating them, instead of on-demand. Maybe a git hook, or maybe a cron job that re-generates them once a day or so. Cheers, Alex > > Cheers > > Deri -- <https://www.alejandro-colomar.es/>
Attachment:
signature.asc
Description: PGP signature