Re: Issue in man page wcsncpy.3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Helge,

At 2022-12-05T18:09:35+0100, Helge Kreutzmann wrote:
> Hello Alejandro,
> On Sun, Dec 04, 2022 at 09:44:47PM +0100, Alejandro Colomar wrote:
> > On 12/4/22 10:07, Helge Kreutzmann wrote:
> > > Without further ado, the following was found:
> > > 
> > > Issue:    Is the "L" in the bracket (for the NULL character) correct?
> > 
> > AFAIK, yes.  I never used it myself, but I believe L'\0' generates a "null
> > wide character".
> 
> Just to get this clear for myself, the man page currently use quoting
> characters (not plain ''), i.e. 
> 
> L\\(aq\\e0\\(aq

Right.  \(aq means "write out ASCII 39 decimal (U+0027)".[0]

> And this should not be translated? Currently I translate the quotes,
> i.e. in German this is marked as:
> 
> L»\\e0«
> 
> This is probably wrong?

Yes.  You are turning this into a set of typographical quotes for
written prose, but the expression is a literal constant in the C
language and must be typed using "straight single quotes" a.k.a. ASCII
39 decimal.

L'\0' is a null wide character, that is, a null character of type
wchar_t.  The language has this because '\0' without the L prefix
already means a null character constant of CHAR_BITS width; if wchar_t
is wider than that, then there can be ambiguity with respect to what
happens to the higher-order bits in the object thus initialized.

The following compiles without warnings on my system, even with -Wall.

int main(int argc, char *argv[]) {
        wchar_t w1 = '\0', w2 = L'\0';
        printf("%d\n", (w1 + w2));
}

For me this reliably writes "0" to the standard output.

However it is conceivable, depending on the implementation, that bits 8+
of w1 come from uninitialized memory, and a large positive or negative
value would be written to stdout.

C is full of undefined and implementation-dependent behavior.  This is
what makes it go fast and break stuff.

> Is there a way to note that this quotes are not to be translated even
> though they are not printed literally but with the macro \\(aq?

Technically, in roff parlance, that is not a macro, but a special
character escape sequence.[1]

> I explicitly ask this because using macros (markup) is a clear sign
> for me that it can be translated, and thus this breaks my heuristics.

That heuristic is not reliable.  \(aq and \(dq, among other
characters,[2] will often be used in man pages to _avoid_ the output of
glyphs common in a conventional prose typography context.

groff_char(7) surveys several kinds of quotation mark.  UTF-8 follows.

  Quotation marks
    The neutral double quote, often useful when documenting programming
    languages, is also available as a special character for convenient
    embedding in macro arguments; see subsection “Fundamental character
    set” above.

    Output   Input   Unicode   Notes
    ─────────────────────────────────────────────────────────────────────
    „        \[Bq]   u201E     low double comma quote
    ‚        \[bq]   u201A     low single comma quote
    “        \[lq]   u201C     left double quote
    ”        \[rq]   u201D     right double quote
    ‘        \[oq]   u2018     single opening (left) quote
    ’        \[cq]   u2019     single closing (right) quote
    '        \[aq]   u0027     apostrophe, neutral single quote
    "        "       u0022     neutral double quote
    "        \[dq]   u0022     neutral double quote
    «        \[Fo]   u00AB     left double chevron
    »        \[Fc]   u00BB     right double chevron
    ‹        \[fo]   u2039     left single chevron
    ›        \[fc]   u203A     right single chevron

Programming languages frequently attach important semantics to \(aq and
\(dq (ASCII ' and "), so it is important not to subject these to natural
language quotation mark transformations.

Because of their specialized nature, this also means that if you see a
man page using them in prose, the page is wrong.  You should translate
the quotation marks as if you were seeing \(lq, \(rq, \(oq, \(cq, and so
forth.

Here's an example of erroneous input.

After reading from /proc/$$/mem, Anne\(aqs mom told her not to
\(dqparty\(dq.

The foregoing should be recast to use conventional punctuation and
typographer's quotes.

After reading from /proc/$$/mem, Anne's mom told her not to
\(lqparty\(rq.

The above uses en_US quotation; en_GB practice is different.

After reading from /proc/$$/mem, Anne's mom told her not to
\(oqparty\(cq.

...but experienced readers of English generally have little trouble
switching conventions.[2]

Regards,
Branden

[0] Technically, the glyph corresponding to it, and this will do the
    right thing even on OS/390 Unix, which uses code page 1047 (EBCDIC).
    There is a way to ask for glyph index 39 in the current font, but a
    man page should never fool with that.

[1] Once a groff user is good and comfortable with the distinction,
    someone comes along and does this.

    https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/node.cpp#n5029

    This is why manufacturers of voodoo dolls will never starve.

[2] From groff_man_style(7):

   • Some ASCII characters look funny or copy and paste wrong.
        On devices with large glyph repertoires, like UTF‐8‐capable
        terminals and PDF, several keyboard glyphs are mapped to code
        points outside the Unicode basic Latin range because that
        usually results in better typography in the general case.  When
        documenting GNU/Linux command or C language syntax, however,
        this translation is sometimes not desirable.

        To get a “literal”...   ...should be input.
        ────────────────────────────────────────────
                            '   \(aq
                            -   \-
                            \   \(rs
                            ^   \(ha
                            `   \(ga
                            ~   \(ti
        ────────────────────────────────────────────

        Additionally, if a neutral double quote (") is needed in a macro
        argument, you can use \(dq to get it.  You should not use \(aq
        for an ordinary apostrophe (as in “can’t”) or \- for an ordinary
        hyphen (as in “word‐aligned”).  Review subsection “Portability”
        above.

[3] The U.K. practice of dropping periods from abbreviations when the
    last letter of the abbreviated word remains intact is far more
    distracting and productive of ambiguity.

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux