Re: hyphens at ends of pages (was: No 6.05/.01 pdf book available)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2023-08-13 15:47, G. Branden Robinson wrote:
At 2023-08-13T14:30:34-0600, Brian Inglis wrote:
Please see attached awk script and logs showing pages with end of page
"hyphens" in text of PDFs from `pdftotext -layout`: "official" PDF has
47, newer PDFs break only at 5 compound word joins or double dashes.

Later I said:

I added a paper variable, made the changes [to NA letter size], it seems
to work, and reduces end of page hyphens to one compound word instance in mbind(2); log attached:

	nodemask ... on-
	...
	line, ...

There appear to be 24 single word instances of online and 12 outdated
hyphenated compound word instances of on-line across all man pages.

Generating a letter size man book using 1.23.0 gropdf eliminates all spurious end of page hyphens barring that case, allowed by your doc below.

With ~5% 140 more letter 2964pp than A4 2823pp pages, the latter pages break on 5 times as many compound words.

Also note that although on-line appears 12 times, and online twice that, offline and dial(up|out|ing)? both appear 7 times, with no off-line or dial-up.

If hyphenation is occurring at the ends of pages but otherwise normally,
then that is a symptom of the *roff automatic hyphenation mode being set
wrong.  The most likely suspect is an argument-free `.hy` invocation
somewhere in the page sources.

This is why I have nattered on about not messing with the hyphenation
mode in man page sources in recent mails (and commit messages) to this
list.[1]

In groff 1.23.0, we smuggled some of the explanation of *roff
hyphenation out of our Texinfo manual into the groff(7) page.  I'm
trimming a few sentences man page authors don't need.

Hyphenation
      When filling, groff hyphenates words as needed at user‐specified
      and automatically determined hyphenation points.  Explicitly
      hyphenated words such as “mother‐in‐law” are always eligible for
      breaking after each of their hyphens.  The hyphenation character \%
      and non‐printing break point \: escape sequences may be used to
      control the hyphenation and breaking of individual words.  [...]
      Otherwise, groff determines hyphenation points automatically by
      default.

      Several requests influence automatic hyphenation.  Because
      conventions vary, a variety of hyphenation modes is available to
      the .hy request; these determine whether hyphenation will apply to
      a word prior to breaking a line at the end of a page (more or less;
      see below for details), and at which positions within that word
      automatically determined hyphenation points are permissible.  The
      default is “1” for historical reasons, but this is not an
      appropriate value for the English hyphenation patterns used by
      groff; localization macro files loaded by troffrc and macro
      packages often override it.

      0    disables hyphenation.

      1    enables hyphenation except after the first and before the last
           character of a word.

      The remaining values “imply” 1; that is, they enable hyphenation
      under the same conditions as “.hy 1”, and then apply or lift
      restrictions relative to that basis.

      2    disables hyphenation of the last word on a page.  (Hyphenation
           is prevented if the next page location trap is closer to the
           vertical drawing position than the next text baseline would
           be.  See section “Traps” below.)

I have yet to evaluate the numbers of orphans, widows, and runts (single word widows) generated by each gropdf release, but there seems to be little apparent difference between 1.23.0 and Deri's 1.23.0+ new gropdf.

--
Take care. Thanks, Brian Inglis              Calgary, Alberta, Canada

La perfection est atteinte                   Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter  not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer     but when there is no more to cut
                                -- Antoine de Saint-Exupéry



[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux