hyphens at ends of pages (was: No 6.05/.01 pdf book available)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Brian,

At 2023-08-13T14:30:34-0600, Brian Inglis wrote:
> Please see attached awk script and logs showing pages with end of page
> "hyphens" in text of PDFs from `pdftotext -layout`: "official" PDF has
> 47, newer PDFs break only at 5 compound word joins or double dashes.

If hyphenation is occurring at the ends of pages but otherwise normally,
then that is a symptom of the *roff automatic hyphenation mode being set
wrong.  The most likely suspect is an argument-free `.hy` invocation
somewhere in the page sources.

This is why I have nattered on about not messing with the hyphenation
mode in man page sources in recent mails (and commit messages) to this
list.[1]

In groff 1.23.0, we smuggled some of the explanation of *roff
hyphenation out of our Texinfo manual into the groff(7) page.  I'm
trimming a few sentences man page authors don't need.

Hyphenation
     When filling, groff hyphenates words as needed at user‐specified
     and automatically determined hyphenation points.  Explicitly
     hyphenated words such as “mother‐in‐law” are always eligible for
     breaking after each of their hyphens.  The hyphenation character \%
     and non‐printing break point \: escape sequences may be used to
     control the hyphenation and breaking of individual words.  [...]
     Otherwise, groff determines hyphenation points automatically by
     default.

     Several requests influence automatic hyphenation.  Because
     conventions vary, a variety of hyphenation modes is available to
     the .hy request; these determine whether hyphenation will apply to
     a word prior to breaking a line at the end of a page (more or less;
     see below for details), and at which positions within that word
     automatically determined hyphenation points are permissible.  The
     default is “1” for historical reasons, but this is not an
     appropriate value for the English hyphenation patterns used by
     groff; localization macro files loaded by troffrc and macro
     packages often override it.

     0    disables hyphenation.

     1    enables hyphenation except after the first and before the last
          character of a word.

     The remaining values “imply” 1; that is, they enable hyphenation
     under the same conditions as “.hy 1”, and then apply or lift
     restrictions relative to that basis.

     2    disables hyphenation of the last word on a page.  (Hyphenation
          is prevented if the next page location trap is closer to the
          vertical drawing position than the next text baseline would
          be.  See section “Traps” below.)
[...]

Regards,
Branden

[1] https://lore.kernel.org/linux-man/20230730200321.ocribgmh2fmk2gto@illithid/

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux