Suppressing hyphenation (was: [PATCH] mctp.7: Add man page for Linux MCTP support)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[Jeremy Kerr dropped from CC--I hope that's okay]

Hi Alex,

Getting back to this after a month...

At 2021-10-18T09:53:54+0200, Alejandro Colomar (man-pages) wrote:
> On 10/18/21 9:16 AM, Alejandro Colomar (man-pages) wrote:
> > > So we might write
> > > 
> > > .B struct\~\%sockaddr_mctp
> > 
> > Okay.
> 
> Actually, wouldn't it be better to just write?:
> 
> .B \%struct\~sockaddr_mctp
> 
> This way \% applies to the whole (even if it was unnecessary for
> 'struct\~').

In fact it does not apply to the whole; '\~' still counts as a word
delimiter to groff even if it is not a permissible location for a
"break" (line break).

Before I bust out the long explanation, I'll try to present some short
advice for man page writers.

* If you wish to suppress hyphenation with the '\%' escape sequence,
  place it at the _beginning_ of each such word.  Except for special
  character escape sequences like '\-', '\(ha', and '\[aq]', most groff
  escape sequences act as word boundaries, so you may need to specify
  '\%' before each word in a series, as in '\%typedef\~int\~\%strsize'.

Now for the deeper dive.

As strange as it may seem, this is consistent with the behavior of
hyphenation when it encounters most other escape sequences[1] (most of
which a portable man page should not attempt to use).  The key factor to
consider in matters of hyphenation suppression is where the _word
boundaries_ are, not where white space appears.

By contrast, anything that formats a glyph for output generally _is_
part of a word.  But only glyphs that not part of natural language words
(in English, [A-Za-z]) are eligible for adjacent hyphenation.

Here's the documentation of '\%' (and '\:') from the Info documentation
of the forthcoming groff 1.23.0 release.

[[
 -- Escape: \%
 -- Escape: \:
     To tell GNU 'troff' how to hyphenate words as they occur in input,
     use the '\%' escape, also known as the "hyphenation character".
     Each instance within a word indicates to GNU 'troff' that the word
     may be hyphenated at that point, while prefixing a word with this
     escape prevents it from being otherwise hyphenated.  This mechanism
     affects only that occurrence of the word; to change the hyphenation
     of a word for the remainder of input processing, use the 'hw'
     request.

     GNU 'troff' regards the escapes '\X' and '\Y' as starting a word;
     that is, the '\%' escape in, say, '\X'...'\%foobar' or
     '\Y'...'\%foobar' no longer prevents hyphenation of 'foobar' but
     inserts a hyphenation point just prior to it; most likely this
     isn't what you want.  *Note Postprocessor Access::.

     The '\:' escape inserts a non-printing break point; that is, the
     word can break there, but the soft hyphen glyph (see below) is not
     written to the output if it does.  This escape is an input word
     boundary, so the remainder of the word is subject to hyphenation as
     normal.

     You can use '\:' and '\%' in combination to control breaking of a
     file name or URL or to permit hyphenation only after certain
     explicit hyphens within a word.

          The \%Lethbridge-Stewart-\:\%Sackville-Baggins divorce
          was, in retrospect, inevitable once the contents of
          \%/var/log/\:\%httpd/\:\%access_log on the family web
          server came to light, revealing visitors from Hogwarts.
]]

Here's a short shell script to tell you where your installed
version of groff will hyphenate words: it forces hyphenation to occur at
every possible location.

$ cat ~/bin/hyphen
#!/bin/sh

for W
do
    printf ".hy 4\n.ll 1u\n%s\n" "$W" | nroff -Wbreak | sed '/^$/d' \
        | tr -d '\n'
    echo
done

$ LC_ALL=C hyphen antidisestablishmentarianism 'struct\\~sockaddr'
an-tidis-es-tab-lish-men-tar-i-an-ism
struct\~sock-addr
$ LC_ALL=C hyphen sockaddr \\%sockaddr \\%sock\\%addr sock_addr sock^addr
sock-addr
sockaddr
sock-addr
sock_addr
sock^addr

(I set the locale so as to keep this email strictly "basic Latin", groff
will happily emit proper Unicode hyphens U+2010 to a supporting output
device.)

You can see from the above that we can't recklessly sprinkle '\%': apart
from looking ugly, '\%' at the beginning of a word suppresses only
_automatic_ hyphenation.  If you specify it both at the beginning _and_
within a word, its other meaning of marking a hyphenation point is
still honored.

Regards,
Branden

[1] There are a few exceptions, like those which "don't produce an input
token" as the groff Texinfo manual puts it, a construction that is more
intelligible to the groff developer than the groff user.  These
have to do with escape sequences that change the way glyphs are
rendered, such as changes to the font style or family, type size, or
stroke or fill colors.  Most of these should never occur in portable man
pages and even '\f' is, in my view, better handled with man(7) font
style macros and the '\c' escape sequence if required for break
suppression.

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux