[Jeremy Kerr dropped from CC--I hope that's okay] Hi Alex, Getting back to this after a month... At 2021-10-18T09:53:54+0200, Alejandro Colomar (man-pages) wrote: > On 10/18/21 9:16 AM, Alejandro Colomar (man-pages) wrote: > > > So we might write > > > > > > .B struct\~\%sockaddr_mctp > > > > Okay. > > Actually, wouldn't it be better to just write?: > > .B \%struct\~sockaddr_mctp > > This way \% applies to the whole (even if it was unnecessary for > 'struct\~'). In fact it does not apply to the whole; '\~' still counts as a word delimiter to groff even if it is not a permissible location for a "break" (line break). Before I bust out the long explanation, I'll try to present some short advice for man page writers. * If you wish to suppress hyphenation with the '\%' escape sequence, place it at the _beginning_ of each such word. Except for special character escape sequences like '\-', '\(ha', and '\[aq]', most groff escape sequences act as word boundaries, so you may need to specify '\%' before each word in a series, as in '\%typedef\~int\~\%strsize'. Now for the deeper dive. As strange as it may seem, this is consistent with the behavior of hyphenation when it encounters most other escape sequences[1] (most of which a portable man page should not attempt to use). The key factor to consider in matters of hyphenation suppression is where the _word boundaries_ are, not where white space appears. By contrast, anything that formats a glyph for output generally _is_ part of a word. But only glyphs that not part of natural language words (in English, [A-Za-z]) are eligible for adjacent hyphenation. Here's the documentation of '\%' (and '\:') from the Info documentation of the forthcoming groff 1.23.0 release. [[ -- Escape: \% -- Escape: \: To tell GNU 'troff' how to hyphenate words as they occur in input, use the '\%' escape, also known as the "hyphenation character". Each instance within a word indicates to GNU 'troff' that the word may be hyphenated at that point, while prefixing a word with this escape prevents it from being otherwise hyphenated. This mechanism affects only that occurrence of the word; to change the hyphenation of a word for the remainder of input processing, use the 'hw' request. GNU 'troff' regards the escapes '\X' and '\Y' as starting a word; that is, the '\%' escape in, say, '\X'...'\%foobar' or '\Y'...'\%foobar' no longer prevents hyphenation of 'foobar' but inserts a hyphenation point just prior to it; most likely this isn't what you want. *Note Postprocessor Access::. The '\:' escape inserts a non-printing break point; that is, the word can break there, but the soft hyphen glyph (see below) is not written to the output if it does. This escape is an input word boundary, so the remainder of the word is subject to hyphenation as normal. You can use '\:' and '\%' in combination to control breaking of a file name or URL or to permit hyphenation only after certain explicit hyphens within a word. The \%Lethbridge-Stewart-\:\%Sackville-Baggins divorce was, in retrospect, inevitable once the contents of \%/var/log/\:\%httpd/\:\%access_log on the family web server came to light, revealing visitors from Hogwarts. ]] Here's a short shell script to tell you where your installed version of groff will hyphenate words: it forces hyphenation to occur at every possible location. $ cat ~/bin/hyphen #!/bin/sh for W do printf ".hy 4\n.ll 1u\n%s\n" "$W" | nroff -Wbreak | sed '/^$/d' \ | tr -d '\n' echo done $ LC_ALL=C hyphen antidisestablishmentarianism 'struct\\~sockaddr' an-tidis-es-tab-lish-men-tar-i-an-ism struct\~sock-addr $ LC_ALL=C hyphen sockaddr \\%sockaddr \\%sock\\%addr sock_addr sock^addr sock-addr sockaddr sock-addr sock_addr sock^addr (I set the locale so as to keep this email strictly "basic Latin", groff will happily emit proper Unicode hyphens U+2010 to a supporting output device.) You can see from the above that we can't recklessly sprinkle '\%': apart from looking ugly, '\%' at the beginning of a word suppresses only _automatic_ hyphenation. If you specify it both at the beginning _and_ within a word, its other meaning of marking a hyphenation point is still honored. Regards, Branden [1] There are a few exceptions, like those which "don't produce an input token" as the groff Texinfo manual puts it, a construction that is more intelligible to the groff developer than the groff user. These have to do with escape sequences that change the way glyphs are rendered, such as changes to the font style or family, type size, or stroke or fill colors. Most of these should never occur in portable man pages and even '\f' is, in my view, better handled with man(7) font style macros and the '\c' escape sequence if required for break suppression.
Attachment:
signature.asc
Description: PGP signature