[Bug 60807] not all the pages are encoded using utf-8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=60807

Michael Kerrisk <mtk.manpages@xxxxxxxxx> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mtk.manpages@xxxxxxxxx

--- Comment #4 from Michael Kerrisk <mtk.manpages@xxxxxxxxx> ---
(In reply to Peter Schiffer from comment #3)
> $ ./print_encoding.sh man?/*
> 
>    Man Page               Encoding by file   Encoding by first line
> 
>  * man2/close.2           iso-8859-1         
>  * man2/getdomainname.2   iso-8859-1         
>  * man2/getrlimit.2       iso-8859-1         
>  * man2/madvise.2         iso-8859-1         
>  * man2/mount.2           utf-8              
>  * man2/sysinfo.2         iso-8859-1         
>  * man2/umask.2           iso-8859-1         
>  * man3/encrypt.3         iso-8859-1         
>  * man3/fclose.3          iso-8859-1         
>  * man3/fflush.3          iso-8859-1         
>  * man3/lockf.3           iso-8859-1         
>  * man3/rand.3            iso-8859-1         
>  * man3/strtok.3          iso-8859-1         
>  * man3/toupper.3         iso-8859-1         
>  * man3/updwtmp.3         iso-8859-1         
>  * man4/st.4              utf-8              
>  * man5/utmp.5            iso-8859-1         
>  * man7/armscii-8.7       iso-8859-1         ARMSCII-8
>  * man7/cp1251.7          unknown-8bit       CP1251
>  * man7/environ.7         iso-8859-1         
>  * man7/hier.7            iso-8859-1         
>  * man7/iso_8859-10.7     iso-8859-1         ISO-8859-10
>  * man7/iso_8859-11.7     iso-8859-1         ISO-8859-11
>  * man7/iso_8859-13.7     iso-8859-1         ISO-8859-7
>  * man7/iso_8859-14.7     iso-8859-1         ISO-8859-14
>  * man7/iso_8859-15.7     iso-8859-1         ISO-8859-15
>  * man7/iso_8859-16.7     iso-8859-1         ISO-8859-16
>  * man7/iso_8859-1.7      iso-8859-1         
>  * man7/iso_8859-2.7      iso-8859-1         ISO-8859-2
>  * man7/iso_8859-3.7      iso-8859-1         ISO-8859-3
>  * man7/iso_8859-4.7      iso-8859-1         ISO-8859-4
>  * man7/iso_8859-5.7      iso-8859-1         ISO-8859-5
>  * man7/iso_8859-6.7      iso-8859-1         ISO-8859-6
>  * man7/iso_8859-7.7      iso-8859-1         ISO-8859-7
>  * man7/iso_8859-8.7      iso-8859-1         ISO-8859-8
>  * man7/iso_8859-9.7      iso-8859-1         ISO-8859-9
>  * man7/koi8-r.7          unknown-8bit       KOI8-R
>  * man7/koi8-u.7          unknown-8bit       
>  * man7/suffixes.7        iso-8859-1         
> 
> $ ./convert_to_utf_8.sh tmp_encoded man?/*
> Converting man2/close.2            from iso-8859-1
> Converting man2/getdomainname.2    from iso-8859-1
> Converting man2/getrlimit.2        from iso-8859-1
> Converting man2/madvise.2          from iso-8859-1
> Converting man2/mount.2            from utf-8
> Converting man2/sysinfo.2          from iso-8859-1
> Converting man2/umask.2            from iso-8859-1
> Converting man3/encrypt.3          from iso-8859-1
> Converting man3/fclose.3           from iso-8859-1
> Converting man3/fflush.3           from iso-8859-1
> Converting man3/lockf.3            from iso-8859-1
> Converting man3/rand.3             from iso-8859-1
> Converting man3/strtok.3           from iso-8859-1
> Converting man3/toupper.3          from iso-8859-1
> Converting man3/updwtmp.3          from iso-8859-1
> Converting man4/st.4               from utf-8
> Converting man5/utmp.5             from iso-8859-1
> Converting man7/armscii-8.7        from armscii-8
> Converting man7/cp1251.7           from cp1251
> Converting man7/environ.7          from iso-8859-1
> Converting man7/hier.7             from iso-8859-1
> Converting man7/iso_8859-10.7      from iso_8859-10
> Converting man7/iso_8859-11.7      from iso-8859-1
> Converting man7/iso_8859-13.7      from iso-8859-1
> Converting man7/iso_8859-14.7      from iso_8859-14
> Converting man7/iso_8859-15.7      from iso_8859-15
> Converting man7/iso_8859-16.7      from iso_8859-16
> Converting man7/iso_8859-1.7       from iso_8859-1
> Converting man7/iso_8859-2.7       from iso_8859-2
> Converting man7/iso_8859-3.7       from iso_8859-3
> Converting man7/iso_8859-4.7       from iso_8859-4
> Converting man7/iso_8859-5.7       from iso_8859-5
> Converting man7/iso_8859-6.7       from iso_8859-6
> Converting man7/iso_8859-7.7       from iso_8859-7
> Converting man7/iso_8859-8.7       from iso_8859-8
> Converting man7/iso_8859-9.7       from iso_8859-9
> Converting man7/koi8-r.7           from koi8-r
> Converting man7/koi8-u.7           from koi8-u
> Converting man7/suffixes.7         from iso-8859-1
> 
> $ cd tmp_encoded/
> 
> $ ../print_encoding.sh man?/*
> 
>    Man Page               Encoding by file   Encoding by first line
> 
>  * man2/close.2           utf-8              UTF-8
>  * man2/getdomainname.2   utf-8              UTF-8
>  * man2/getrlimit.2       utf-8              UTF-8
>  * man2/madvise.2         utf-8              UTF-8
>  * man2/mount.2           utf-8              UTF-8
>  * man2/sysinfo.2         utf-8              UTF-8
>  * man2/umask.2           utf-8              UTF-8
>  * man3/encrypt.3         utf-8              UTF-8
>  * man3/fclose.3          utf-8              UTF-8
>  * man3/fflush.3          utf-8              UTF-8
>  * man3/lockf.3           utf-8              UTF-8
>  * man3/rand.3            utf-8              UTF-8
>  * man3/strtok.3          utf-8              UTF-8
>  * man3/toupper.3         utf-8              UTF-8
>  * man3/updwtmp.3         utf-8              UTF-8
>  * man4/st.4              utf-8              UTF-8
>  * man5/utmp.5            utf-8              UTF-8
>  * man7/armscii-8.7       utf-8              UTF-8
>  * man7/cp1251.7          utf-8              UTF-8
>  * man7/environ.7         utf-8              UTF-8
>  * man7/hier.7            utf-8              UTF-8
>  * man7/iso_8859-10.7     utf-8              UTF-8
>  * man7/iso_8859-11.7     utf-8              UTF-8
>  * man7/iso_8859-13.7     utf-8              UTF-8
>  * man7/iso_8859-14.7     utf-8              UTF-8
>  * man7/iso_8859-15.7     utf-8              UTF-8
>  * man7/iso_8859-16.7     utf-8              UTF-8
>  * man7/iso_8859-1.7      utf-8              UTF-8
>  * man7/iso_8859-2.7      utf-8              UTF-8
>  * man7/iso_8859-3.7      utf-8              UTF-8
>  * man7/iso_8859-4.7      utf-8              UTF-8
>  * man7/iso_8859-5.7      utf-8              UTF-8
>  * man7/iso_8859-6.7      utf-8              UTF-8
>  * man7/iso_8859-7.7      utf-8              UTF-8
>  * man7/iso_8859-8.7      utf-8              UTF-8
>  * man7/iso_8859-9.7      utf-8              UTF-8
>  * man7/koi8-r.7          utf-8              UTF-8
>  * man7/koi8-u.7          utf-8              UTF-8
>  * man7/suffixes.7        utf-8              UTF-8

Peter,

Sorry to be slow following up on this. Thanks for the scripts.

As some background, I'll just note that the current encoding markers in the
iso_8859* pages were added in response to this 2009 bug report:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=519209

It seems a reasonable idea to convert everything to UTF-8, but I have some
concerns/questions.

1. Is the encoding line: 
'\" t -*- coding: UTF-8 -*-
really needed, or does modern groff just work this out?

2. I'm concerned about backward compatibility issues. As in: what if someone
loads the man pages onto a system with old groff. Now, as far as I can work
out, groff added input unicode support in v1.20, 2009
(http://lists.gnu.org/archive/html/groff/2009-01/msg00011.html). So, perhaps
that's long enough ago that we don't need to worry too much about these issues.

Any thoughts?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux