Hi Brian, At 2023-08-13T14:30:34-0600, Brian Inglis wrote: > Please see attached awk script and logs showing pages with end of page > "hyphens" in text of PDFs from `pdftotext -layout`: "official" PDF has > 47, newer PDFs break only at 5 compound word joins or double dashes. If hyphenation is occurring at the ends of pages but otherwise normally, then that is a symptom of the *roff automatic hyphenation mode being set wrong. The most likely suspect is an argument-free `.hy` invocation somewhere in the page sources. This is why I have nattered on about not messing with the hyphenation mode in man page sources in recent mails (and commit messages) to this list.[1] In groff 1.23.0, we smuggled some of the explanation of *roff hyphenation out of our Texinfo manual into the groff(7) page. I'm trimming a few sentences man page authors don't need. Hyphenation When filling, groff hyphenates words as needed at user‐specified and automatically determined hyphenation points. Explicitly hyphenated words such as “mother‐in‐law” are always eligible for breaking after each of their hyphens. The hyphenation character \% and non‐printing break point \: escape sequences may be used to control the hyphenation and breaking of individual words. [...] Otherwise, groff determines hyphenation points automatically by default. Several requests influence automatic hyphenation. Because conventions vary, a variety of hyphenation modes is available to the .hy request; these determine whether hyphenation will apply to a word prior to breaking a line at the end of a page (more or less; see below for details), and at which positions within that word automatically determined hyphenation points are permissible. The default is “1” for historical reasons, but this is not an appropriate value for the English hyphenation patterns used by groff; localization macro files loaded by troffrc and macro packages often override it. 0 disables hyphenation. 1 enables hyphenation except after the first and before the last character of a word. The remaining values “imply” 1; that is, they enable hyphenation under the same conditions as “.hy 1”, and then apply or lift restrictions relative to that basis. 2 disables hyphenation of the last word on a page. (Hyphenation is prevented if the next page location trap is closer to the vertical drawing position than the next text baseline would be. See section “Traps” below.) [...] Regards, Branden [1] https://lore.kernel.org/linux-man/20230730200321.ocribgmh2fmk2gto@illithid/
Attachment:
signature.asc
Description: PGP signature