Re: Converting man-pages to UTF-8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Feb 14, 2014 at 11:43:30AM +0100, Michael Kerrisk (man-pages) wrote:
> At https://bugzilla.kernel.org/show_bug.cgi?id=60807 is a proposal to
> convert the pages of the the "man-pages" project to UTF 8. I thought
> it worthwhile bringing that topic to the list, and CCing a few people
> who may have some ideas about this step, since I'm not too sure of the
> implications.
> 
> Peter Schiffer has kindly written some some scripts to do the
> conversion, which would touch about 40 files. However, as far I can
> tell, many of the pages that have non-ASCII characters have inside
> groff comments (author's names, etc.). The only pages that have
> non-ASCII characters in the rendered source are various man7 pages on
> character sets. These were the pages to which I added a groff encoding
> marker in response to Colin Watson's input on this Debian bug:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=519209
> 
> Moving to UTF-8 for the pages seems like a good idea, at least at some
> point. However, I'm wondering whether there are any backward
> compatibility issues that I should need to worry about. As far as I
> know, groff added UTF-8 support back in Jan 2009, so, just over 5
> years ago. Perhaps that's long enough ago now, that any backward
> compatibility issues with old versions of groff would be minimal.
> (I.e., the number of people installing new man-pages on systems with
> old groff is likely to be very small, and anyway, only a dozen or so
> pages in Section 7 are affected. Furthermore, I'm assuming that Linux
> distros have been shipping groff v1.20+ for quite a long time now.)

I think for characters in comments you're probably fine, and any
problems you might have had should be gone as of groff 1.20.  Debian
switched to that in July 2009, and I think we were late to the party
because we had some difficult historical baggage to clean up at the same
time.  I'm not aware of anyone shipping older versions of groff any
more.

When you convert characters that show up in rendered source, I suspect
systems using the other man package (1.6g or similar versions) may
render them poorly, because it invokes nroff in some fairly naïve and
hardcoded ways.  However, they already break in various related ways,
and most distributions have switched to man-db now, or dealt with things
some other way.  My rough survey of the major distributions for this is:

  Arch has been good since about 2009

  Debian and descendants are good as of late 2007 / early 2008 (addition
  of manconv to man-db)

  Fedora is definitely good as of 2010 (switch to man-db), and I think
  was good before that as IIRC they did a flag day to switch everything
  to UTF-8 with man

  Gentoo switched to man-db at the end of 2013, so should be good now

  Mageia has a current groff, but uses man 1.6g with a stack of patches
  (some encoding-related)

  openSUSE has been fine for about the same length of time as Debian

  Slackware has a current groff, but uses man 1.6g without much in the
  way of special patches (just one to make things work for UTF-8
  *output*)

My guess is that Mageia and Slackware may find that things only work
properly for users in UTF-8 locales, but most other major distributions
should be fine.  You won't be the first author to switch to UTF-8 manual
pages; all you'll be doing is making existing shortcomings perhaps
marginally more obvious.  In any case, the pages currently encoded in
ISO-8859-1 won't be very seriously affected, and users of problematic
systems will only have been able to read the other pages with good luck
and a following wind anyway; switching to UTF-8 will probably actually
improve things for them if they're using a UTF-8 locale.  (That is, the
problems that the affected systems have generally relate to attempting
to read pages whose encoding doesn't match that of their locale.)  They
might possibly need to add the -k option to their nroff invocation in
man.conf.

If I were you I would just go ahead.

Regarding your questions in the bug, please do keep the "coding:" tag in
there; man-db will figure this out by brute force, but if left to its
own devices I think groff's preconv will default to the locale's
encoding, so it will only work for some people.

Cheers,

-- 
Colin Watson                                       [cjwatson@xxxxxxxxxx]
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux