At 09:31 PM 1/17/2007, Bruno Wolff III wrote:
On Wed, Jan 17, 2007 at 23:10:14 +0100, Ola Thoresen <redhat@xxxxxxxx> wrote: > > One of the worst examples of this is the change to UTF-8 as default > charset. I am a devoted UTF-8 user myself, but it is probably the > single change that has caused most pain for others, and it is stil > causing trouble.
> When we changed to UTF-8 as default, there were no > easy way to convert filesystems, documents, text-files, webpages...
Not sure if these two utilities could help: (1) iconv -f old-encoding -t UTF-8 filename > newfilename (2) utf8ize The script: http://ftp.penguin.cz/pub/users/utx/misc/utf8ize.gopts The web page (search for utf8ize): http://www.penguin.cz/~utx/
> The first thing almost everyone I know that are installing Fedora, > Redhat or Suse is doing is to change /etc/sysconfig/i18n to go back to > en_US as default LANG. Simply because it takes a h... of a lot of work > to convert all your files and applications and there are no good tools > out there to help you. UTF-8 is an encoding and en_US is a locale. You are comparing different types of things. Perhaps you meant that UTF-8 was being used instead of ASCII or Latin 1? Note that ASCII is in a sense a subset of UTF-8, so converting from ASCII to UTF-8 isn't a big deal.
Something that I don't feel GLib has done enough is to have enough API supporting non UTF-8 content. For example, if a text file is opened using GIOChannel, the read would fail if the file content isn't containing only UTF-8 content.
The fallback could be more graceful; for example, the API could allow a fallback charset to convert bytes that aren't legal UTF-8 byes to UTF-8. There should exist enough API that is as tolerant to non UTF-8 content as possible (such as using fallback charset).
For example, a lot of people could be using a single European charset before UTF-8 became mainstream. So, with just one fallback charset specified, all these people could have been covered. Their files could be opened and new files are saved as UTF-8 charset.
As it is now, if you want your application to support reading of both UTF-8 and ISO-8859-1 encodings (just the most common 2 sets, not more), most facilities in GLib are not a choice -- if one text file contains just one copyright symbol encoded in ISO-8859-1, you fail to read the entire text file...very far from an ideal scenario.
What do people think? -- Daniel Yek -- fedora-devel-list mailing list fedora-devel-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-devel-list