Hi Helge, At 2022-12-05T18:09:35+0100, Helge Kreutzmann wrote: > Hello Alejandro, > On Sun, Dec 04, 2022 at 09:44:47PM +0100, Alejandro Colomar wrote: > > On 12/4/22 10:07, Helge Kreutzmann wrote: > > > Without further ado, the following was found: > > > > > > Issue: Is the "L" in the bracket (for the NULL character) correct? > > > > AFAIK, yes. I never used it myself, but I believe L'\0' generates a "null > > wide character". > > Just to get this clear for myself, the man page currently use quoting > characters (not plain ''), i.e. > > L\\(aq\\e0\\(aq Right. \(aq means "write out ASCII 39 decimal (U+0027)".[0] > And this should not be translated? Currently I translate the quotes, > i.e. in German this is marked as: > > L»\\e0« > > This is probably wrong? Yes. You are turning this into a set of typographical quotes for written prose, but the expression is a literal constant in the C language and must be typed using "straight single quotes" a.k.a. ASCII 39 decimal. L'\0' is a null wide character, that is, a null character of type wchar_t. The language has this because '\0' without the L prefix already means a null character constant of CHAR_BITS width; if wchar_t is wider than that, then there can be ambiguity with respect to what happens to the higher-order bits in the object thus initialized. The following compiles without warnings on my system, even with -Wall. int main(int argc, char *argv[]) { wchar_t w1 = '\0', w2 = L'\0'; printf("%d\n", (w1 + w2)); } For me this reliably writes "0" to the standard output. However it is conceivable, depending on the implementation, that bits 8+ of w1 come from uninitialized memory, and a large positive or negative value would be written to stdout. C is full of undefined and implementation-dependent behavior. This is what makes it go fast and break stuff. > Is there a way to note that this quotes are not to be translated even > though they are not printed literally but with the macro \\(aq? Technically, in roff parlance, that is not a macro, but a special character escape sequence.[1] > I explicitly ask this because using macros (markup) is a clear sign > for me that it can be translated, and thus this breaks my heuristics. That heuristic is not reliable. \(aq and \(dq, among other characters,[2] will often be used in man pages to _avoid_ the output of glyphs common in a conventional prose typography context. groff_char(7) surveys several kinds of quotation mark. UTF-8 follows. Quotation marks The neutral double quote, often useful when documenting programming languages, is also available as a special character for convenient embedding in macro arguments; see subsection “Fundamental character set” above. Output Input Unicode Notes ───────────────────────────────────────────────────────────────────── „ \[Bq] u201E low double comma quote ‚ \[bq] u201A low single comma quote “ \[lq] u201C left double quote ” \[rq] u201D right double quote ‘ \[oq] u2018 single opening (left) quote ’ \[cq] u2019 single closing (right) quote ' \[aq] u0027 apostrophe, neutral single quote " " u0022 neutral double quote " \[dq] u0022 neutral double quote « \[Fo] u00AB left double chevron » \[Fc] u00BB right double chevron ‹ \[fo] u2039 left single chevron › \[fc] u203A right single chevron Programming languages frequently attach important semantics to \(aq and \(dq (ASCII ' and "), so it is important not to subject these to natural language quotation mark transformations. Because of their specialized nature, this also means that if you see a man page using them in prose, the page is wrong. You should translate the quotation marks as if you were seeing \(lq, \(rq, \(oq, \(cq, and so forth. Here's an example of erroneous input. After reading from /proc/$$/mem, Anne\(aqs mom told her not to \(dqparty\(dq. The foregoing should be recast to use conventional punctuation and typographer's quotes. After reading from /proc/$$/mem, Anne's mom told her not to \(lqparty\(rq. The above uses en_US quotation; en_GB practice is different. After reading from /proc/$$/mem, Anne's mom told her not to \(oqparty\(cq. ...but experienced readers of English generally have little trouble switching conventions.[2] Regards, Branden [0] Technically, the glyph corresponding to it, and this will do the right thing even on OS/390 Unix, which uses code page 1047 (EBCDIC). There is a way to ask for glyph index 39 in the current font, but a man page should never fool with that. [1] Once a groff user is good and comfortable with the distinction, someone comes along and does this. https://git.savannah.gnu.org/cgit/groff.git/tree/src/roff/troff/node.cpp#n5029 This is why manufacturers of voodoo dolls will never starve. [2] From groff_man_style(7): • Some ASCII characters look funny or copy and paste wrong. On devices with large glyph repertoires, like UTF‐8‐capable terminals and PDF, several keyboard glyphs are mapped to code points outside the Unicode basic Latin range because that usually results in better typography in the general case. When documenting GNU/Linux command or C language syntax, however, this translation is sometimes not desirable. To get a “literal”... ...should be input. ──────────────────────────────────────────── ' \(aq - \- \ \(rs ^ \(ha ` \(ga ~ \(ti ──────────────────────────────────────────── Additionally, if a neutral double quote (") is needed in a macro argument, you can use \(dq to get it. You should not use \(aq for an ordinary apostrophe (as in “can’t”) or \- for an ordinary hyphen (as in “word‐aligned”). Review subsection “Portability” above. [3] The U.K. practice of dropping periods from abbreviations when the last letter of the abbreviated word remains intact is far more distracting and productive of ambiguity.
Attachment:
signature.asc
Description: PGP signature