[looping in groff list] Hi Alex, At 2024-03-12T15:12:52+0100, Alejandro Colomar wrote: > Hmm, interesting thing to try! I've tried it too, and the bookmarks > for the in-page sections (e.g., DESCRIPTION, or rather ОПИСАНИЕ) > appear with no name (or maybe it's a locale problem in my system?). > See attached PDF. That's a known problem with groff 1.23.0 and earlier. It went less remarked-upon than it should have because it turns out there is a way to sneak character codes with the eighth bit set as-is out of the formatter into device-independent output. (I just learned about this mechanism this past week.) And since a lot of groff users were happy with the ISO 8859-1 character repertoire in their documents, they were fine with it. It's gone unresolved longer than it should have because fixing it is challenging. If you catch up on groff mailing list traffic for January and February you will see Deri and me discussing it. I have a solution that I think will work,[1] but it keeps growing in scope. The thing I learned last week is that the `\!` escape sequence can be used to smuggle character codes 129-255 decimal into grout. (See attachment.)[2] Addressing this requires surgery on a part of the formatter that tends to be used only by relative experts and there aren't unit tests of this escape sequence to assuage my fear that I don't break things, so I'll have to write them. Regards, Branden [1] The main element of it is to have the `device` request and the `\X` escape sequence (the latter being an AT&T troff feature) read their parameters in copy mode. https://savannah.gnu.org/bugs/?64484 [2] Interestingly, what GNU troff does here is compatible with DWB 3.3 troff but not Heirloom Doctools troff; but Heirloom otherwise has no problem emitting UTF-8 sequences, for instance as arguments to trout 'C' commands. My plan is to have GNU troff reject code points 128 <= n <= 255 in arguments to the `device` and `output` requests (both GNU extensions) and in `\!` and `\X` escape sequence parameters. We don't know what character encoding an output device requires, so my proposal is to require input documents (including macro packages) to express such code points as groff Unicode special character escape sequences (that is, in the form \[u123AB]). An alternative would be to have the output device report what encoding it requires in its DESC file, and give GNU troff the responsibility of converting to that encoding when writing output. But to me that seems like an inferior solution, loading up the formatter with more character set-conversion functionality when it's increasingly a UTF-8 world anyway. The likely persistent exception is the UTF-16-oriented PDF device. Fortunately, in groff 1.23.0, Deri added support to gropdf(1) for interpretation of such escape sequences in device "specials" (device control commands; "x X" commands in trout/grout). I'm attaching another couple of examples to illustrate this. Also, if we make the formatter strict about 7-bit-clean input in groff 1.24, that will clear the decks for moving from an assumption of Latin-1 input today to UTF-8 input in 1.25.
.\" troff | hd # or your choice of hex dumper Hello, world. .sp \!x X The Stupendous Yäppi will now read your mind! .sp Bye.
.\" groff -Kutf8 -Tpdf .nr index 0 1 .de Section . sp 1i . ft B . pdfbookmark 1 "\\$*" . ds mark!\\n+[index] \\*[PDFBOOKMARK.NAME] . nop \\$* . ft . sp .. .Section "\%A naïve attempt at bookmarking" Sed ut perspiciatis, unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam eaque ipsa, quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt, explicabo. Nemo enim ipsam voluptatem, quia voluptas sit, aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos, qui ratione voluptatem sequi nesciunt, neque porro quisquam est, qui dolorem ipsum, quia dolor sit amet consectetur adipiscivelit, sed quia non-numquam eius modi tempora incidunt, ut labore et dolore magnam aliquam quaerat voluptatem. .bp .Section "Another section" Return to .pdfhref L -D \*[mark!1] -- the first section or .pdfhref L -A . -D \*[mark!2] -- the last one
.\" groff -Tpdf .\" needs groff 1.23.0 or later .nr index 0 1 .de Section . sp 1i . ft B . pdfbookmark 1 "\\$*" . ds mark!\\n+[index] \\*[PDFBOOKMARK.NAME] . nop \\$* . ft . sp .. .Section "\%A na\[u00EF]ve attempt at bookmarking" Sed ut perspiciatis, unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam eaque ipsa, quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt, explicabo. Nemo enim ipsam voluptatem, quia voluptas sit, aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos, qui ratione voluptatem sequi nesciunt, neque porro quisquam est, qui dolorem ipsum, quia dolor sit amet consectetur adipiscivelit, sed quia non-numquam eius modi tempora incidunt, ut labore et dolore magnam aliquam quaerat voluptatem. .bp .Section "Another section" Return to .pdfhref L -D \*[mark!1] -- the first section or .pdfhref L -A . -D \*[mark!2] -- the last one
Attachment:
signature.asc
Description: PGP signature