unicode encoding issues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



So at the time we turned anaconda translations into unicodes I guessed we'd just be swapping one set of UnicodeDecodeErrors for another on the live images; unfortunately it seems like that's what's happening. We've already found two new UnicodeDecodeErrors in 21 Final TC1 that have been caused by turning translations into unicodes:

https://bugzilla.redhat.com/show_bug.cgi?id=1217504
https://bugzilla.redhat.com/show_bug.cgi?id=1217610

and I suspect this bug is somehow caused by the same thing, though it's not as clear cut:

https://bugzilla.redhat.com/show_bug.cgi?id=1217411

Looking at the first two bugs, I checked through anaconda for instances of str(e) (on the basis that 'e' is conventionally used for exceptions): there are 30. I also used a somewhat dumb grep to try and find cases where we do a %s substitution into a translation:

grep -R "_(.*%" *

and that gives 244 cases.

One strand of this whole nightmare that we kinda lost track of is that *this doesn't always go wrong*. Sometimes, python does somehow know to use utf-8 rather than ascii. sgallagh and I got some way towards investigating this back in the F21 timeframe, but eventually we moved on to other angles. I thought it might have something to do with the setup_locale() call in welcome.py , but the anaconda script itself already calls that much earlier, so now I'm not so sure.

We still have the option of the big hammer to force the 'default' encoding to be utf-8 on the lives as well as non-lives. I am (with extreme regret) reading http://www.gossamer-threads.com/lists/engine?do=post_view_flat;post=800861;page=1;sb=post_latest_reply;so=ASC;mh=25;list=python again, which I think is where we get the objections to doing that. And it sure sounds bad:

===================

"If you change these, you are on your own and strange things will
start to happen. The default encoding does not only affect
the translation between Python and the outside world, but also
all internal conversions between 8-bit strings and Unicode.

Hacks like [this] are just
downright wrong and will cause serious problems since Unicode
objects cache their default encoded representation."

"The key problem is that objects that compare equal should also hash
equal. String and Unicode hashing has been constructed so that byte
strings hash the same as if interpreted as latin-1. If, say, utf-8
would be the system encoding, then, for some values of S,

S == unicode(S) and hash(S) != hash(unicode(S))

That, in turn, *will* break dictionaries."

====================

But then - none of this seems unique to the live image case. On non-lives, we *already use exactly the hack they say is so terrible* - it's the whole reason we have pyanaconda/sitecustomize.py:

import sys
# pylint: disable=no-member
sys.setdefaultencoding('utf-8')

I may be missing something, but so far as I can see, while we would have to implement the hack slightly differently in the live case, the different implementation isn't any *more* dangerous than the one we're already using in the non-live case. The only thing different about the live case is the use of reload(sys) vs. using the site-customize trick, and so far as I can see, none of the objections to this hack are about the use of reload(sys), they're about the use of sys.setdefaultencoding().

If I'm wrong about that, do enlighten me :)

Otherwise, though, what exactly do we have to lose? I'm happy with the idea that it's the wrong thing to do. We do lots of wrong things. Some days I do 30 wrong things before breakfast. If the only other alternative is poking through the entire installer trying to trigger every goddamn translated string to find all the broken cases, let's do something wrong.
--
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net

_______________________________________________
Anaconda-devel-list mailing list
Anaconda-devel-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/anaconda-devel-list




[Index of Archives]     [Kickstart]     [Fedora Users]     [Fedora Legacy List]     [Fedora Maintainers]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [Yosemite Photos]     [KDE Users]     [Fedora Tools]
  Powered by Linux