Patrice Dumas wrote:
I'm almost in complete agreement with you. The one extra piece that I think should be considered is how the text is normally viewed/edited.On Fri, May 30, 2008 at 06:56:33PM -0700, Toshio Kuratomi wrote:Reencoding the xml files that specify an encoding isn't strictly necessary. We should probably ask upstream whether they are amenable toI think that reencoding files that carry over the encoding information (info, texinfo, tex and xml for example) is wrong. It is better to let upstream do whatever they want. Same for examples of code, better leave the encoding preferred by upstream. For NEWS/Changelog, other text files in %doc and also man pages that arenot installed in a non utf8 locale, I agree that converting to UTF-8 is better.
For instance, if a program has a plain text data file and the program expects the data to be encoded in utf-16 that should stay utf-16. Since the end user never views the file and the program has an expectation of what's in it, this should be perfectly acceptable.
However, the flipside of this is if a program has an xml config file that the user is expected to edit manually in a text editor and the program will adapt to multiple encodings (for instance, by using libxml2 to parse the file[1]_) having it exist in utf-8 is much better than having it exist in SOME_EXOTIC_ENCODING. In this case it's the program that doesn't care that the config file is in utf-8 vs SHIFT-JIS. But the user that opens the file in a text editor will be presented with garbage if the text does not match the system default encoding. Yes, the user can manually change the encoding that is displayed and saved in some editors but:
1) This is not the full range of editors.2) The user has to learn to enable the new encoding in their editor. This involves reading, editing, and saving. Some editors will display garbage unless you set the correct encoding on startup, others can change while running; some convert on open with a best guess at what the bytes mean but you have to specify what encoding to save the result otherwise you get the default (utf-8 or dependent on your locale settings).
3) If the user wants to use characters that are not present in the encoding the file is written in (for instance, the file is encoded in KOI8-R but the user wants to use kanji.) They'll have to convert the file to a unicode family of encodings and edit the header that tells the character set to use before making their changes.
So really, whether the user is intended to edit/view the file directly instead of through a program that can change the encoding appropriately should be the dividing line rather than whether the format specifies the encoding/does not specify encoding.
.. _[1]: http://xmlsoft.org/encoding.html#DefaultWhether this is something we should do in our packages even if upstream doesn't accept the changes involves other factors. In the case of documentation files that have no encoding we should convert whether or not upstream agrees. In the case of documentation that does specify the encoding I lean towards converting [2]_. In the case of a file that is used by a program we should definitely have a conversation with upstream about it, although we could convert locally with upstream's blessing (ie: Upstream says: "I'm going to continue writing my xml config file in latin-1. If you want to convert them to utf-8 for your users that's fine -- I'm going to continue to use a library for xml parsing that understands encodings.")
.. _[2]: Note that this is only for documentation which is not supposed to be viewed directly. xhtml, for instance, is normally going to be viewed in a browser so this would not apply.
-Toshio
Attachment:
signature.asc
Description: OpenPGP digital signature
-- Fedora-packaging mailing list Fedora-packaging@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-packaging