On 2023-08-13 15:47, G. Branden Robinson wrote:
At 2023-08-13T14:30:34-0600, Brian Inglis wrote:
Please see attached awk script and logs showing pages with end of page
"hyphens" in text of PDFs from `pdftotext -layout`: "official" PDF has
47, newer PDFs break only at 5 compound word joins or double dashes.
Later I said:
I added a paper variable, made the changes [to NA letter size], it seems
to work, and reduces end of page hyphens to one compound word instance in
mbind(2); log attached:
nodemask ... on-
...
line, ...
There appear to be 24 single word instances of online and 12 outdated
hyphenated compound word instances of on-line across all man pages.
Generating a letter size man book using 1.23.0 gropdf eliminates all spurious
end of page hyphens barring that case, allowed by your doc below.
With ~5% 140 more letter 2964pp than A4 2823pp pages, the latter pages break on
5 times as many compound words.
Also note that although on-line appears 12 times, and online twice that, offline
and dial(up|out|ing)? both appear 7 times, with no off-line or dial-up.
If hyphenation is occurring at the ends of pages but otherwise normally,
then that is a symptom of the *roff automatic hyphenation mode being set
wrong. The most likely suspect is an argument-free `.hy` invocation
somewhere in the page sources.
This is why I have nattered on about not messing with the hyphenation
mode in man page sources in recent mails (and commit messages) to this
list.[1]
In groff 1.23.0, we smuggled some of the explanation of *roff
hyphenation out of our Texinfo manual into the groff(7) page. I'm
trimming a few sentences man page authors don't need.
Hyphenation
When filling, groff hyphenates words as needed at user‐specified
and automatically determined hyphenation points. Explicitly
hyphenated words such as “mother‐in‐law” are always eligible for
breaking after each of their hyphens. The hyphenation character \%
and non‐printing break point \: escape sequences may be used to
control the hyphenation and breaking of individual words. [...]
Otherwise, groff determines hyphenation points automatically by
default.
Several requests influence automatic hyphenation. Because
conventions vary, a variety of hyphenation modes is available to
the .hy request; these determine whether hyphenation will apply to
a word prior to breaking a line at the end of a page (more or less;
see below for details), and at which positions within that word
automatically determined hyphenation points are permissible. The
default is “1” for historical reasons, but this is not an
appropriate value for the English hyphenation patterns used by
groff; localization macro files loaded by troffrc and macro
packages often override it.
0 disables hyphenation.
1 enables hyphenation except after the first and before the last
character of a word.
The remaining values “imply” 1; that is, they enable hyphenation
under the same conditions as “.hy 1”, and then apply or lift
restrictions relative to that basis.
2 disables hyphenation of the last word on a page. (Hyphenation
is prevented if the next page location trap is closer to the
vertical drawing position than the next text baseline would
be. See section “Traps” below.)
I have yet to evaluate the numbers of orphans, widows, and runts (single word
widows) generated by each gropdf release, but there seems to be little apparent
difference between 1.23.0 and Deri's 1.23.0+ new gropdf.
--
Take care. Thanks, Brian Inglis Calgary, Alberta, Canada
La perfection est atteinte Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut
-- Antoine de Saint-Exupéry