Hi Alex, At 2023-07-31T23:47:50+0200, Alejandro Colomar wrote: > > When the text of all Linux man-pages documents (excluding those > > containing only `so` requests) is dumped, with adjustment mode 'l' > > ("-dAD=l") and automatic hyphenation disabled ("-rHY=0") before and > > after this change, there is no change to rendered output. > > It would be interesting to see a script that corroborates the above > paragraph. It might help other projects that may want to migrate to > MR. Sure. I used a couple of scripts. $ cat ATTIC/dump-pages.sh #!/bin/sh pages=$(grep -L '^\.so ' man*/* | sort) groff -t "$@" -m andoc -T utf8 -P -cbou $pages $ cat ATTIC/dump-pages-left-adjustment-no-hyphenation.sh #!/bin/sh pages=$(grep -L '^\.so ' man*/* | sort) groff -t -dAD=l -rHY=0 -m andoc -T utf8 -P -cbou $pages And here's how I ran them. sh ATTIC/dump-pages.sh >| DUMP1 sed -i -f ./ATTIC/MR.sed $(grep -L '^\.so ' man*/*) sh ATTIC/dump-pages-left-adjustment-no-hyphenation.sh >| DUMP2 diff -U0 -b DUMP1 DUMP2 | less -R That confirmed that there were "no changes" (with the caveat noted above). sh ATTIC/dump-pages.sh >| DUMP2 diff -U0 -b DUMP1 DUMP2 | less -R diff -U0 -b DUMP1 DUMP2 | wc -l I used these to eyeball and measure whether there were any formatting changes even with default adjustment and hyphenation enabled. It showed me _tons_ of man page names no longer getting broken (and hyphenated) across lines, and nothing else that I noticed. With the previous empty diff in hand, I decided that I hadn't regressed the text of the pages. If there are further sanity checks we can apply, I'm open to suggestions. Since you had me looking at my shell history, I'll share that I did a "git co ." (co = alias for "checkout") 18 times in the course of developing MR.sed. Those drove most of my recent patch submissions immediately prior to this one. I could have done 18 more without fatiguing (albeit not necessrily without frustration with myself for not getting my sed right). But that's the beauty of sed, and Bash/readline's "reverse-search-history" and "operate-and-get-next" features. As it turned out, my sed was pretty good, except for the missing use case you identified, and my fix for which worked on the first try. The irregularity of the page inputs was the tricky bit. At one point I had a fearful episode that I'd misdesigned `MR` for one scenario, and much like the Master being terrorized by the Keller Machine, I had visions of the Doctor (Ingo Schwarze) laughing at me and telling me he told me so and winning the whole world over to mdoc(7) in one stroke. But it was fine (attached). There are _still_ some `ad` requests scattered around (outside of tbl(1) text blocks), but I didn't go after those because they weren't in the way of my objective. Eventually it'd be good to scrub those too. > > I prepared this change with the following GNU sed script. > > > > \# Handle simplest cases: ".BR foo (1)" and ".IR foo (1)". > > What I do to avoid git messing with these comments is to write a > leading space. For git, only '#' in column 1 are special. Since most > compilers and interpreters allow a space before a commented line, a > leading space is fine. Ahh. A leading backslash is the only workaround I've ever noticed. > I've edited the commit message to have spaces, so that it's directly > pastable into a MR.sed script. Oh, and I included "$ cat MR.sed;" in > the commit message; I couldn't not do it. :) No worries. :) > I've applied the patch (or rather, the script), but won't push it yet. > If you send a run of commands that prove no differences before and > after, I'll amend the commit message with it. Please do verify it yourself with the tools above (or better ones). I'm well aware that this is a huge change that can make people nervous. Regards, Branden
Attachment:
try-to-break-MR.man
Description: Unix manual page
Attachment:
signature.asc
Description: PGP signature