unicode encoding issues

Adam Williamson <awilliam@xxxxxxxxxx> · Thu, 30 Apr 2015 19:16:55 -0700

So at the time we turned anaconda translations into unicodes I guessed 
we'd just be swapping one set of UnicodeDecodeErrors for another on the 
live images; unfortunately it seems like that's what's happening. We've 
already found two new UnicodeDecodeErrors in 21 Final TC1 that have been 
caused by turning translations into unicodes:

https://bugzilla.redhat.com/show_bug.cgi?id=1217504
https://bugzilla.redhat.com/show_bug.cgi?id=1217610

and I suspect this bug is somehow caused by the same thing, though it's 
not as clear cut:

https://bugzilla.redhat.com/show_bug.cgi?id=1217411

Looking at the first two bugs, I checked through anaconda for instances 
of str(e) (on the basis that 'e' is conventionally used for exceptions): 
there are 30. I also used a somewhat dumb grep to try and find cases 
where we do a %s substitution into a translation:

grep -R "_(.*%" *

and that gives 244 cases.

One strand of this whole nightmare that we kinda lost track of is that 
*this doesn't always go wrong*. Sometimes, python does somehow know to 
use utf-8 rather than ascii. sgallagh and I got some way towards 
investigating this back in the F21 timeframe, but eventually we moved on 
to other angles. I thought it might have something to do with the 
setup_locale() call in welcome.py , but the anaconda script itself 
already calls that much earlier, so now I'm not so sure.

We still have the option of the big hammer to force the 'default' 
encoding to be utf-8 on the lives as well as non-lives. I am (with 
extreme regret) reading 
http://www.gossamer-threads.com/lists/engine?do=post_view_flat;post=800861;page=1;sb=post_latest_reply;so=ASC;mh=25;list=python 
again, which I think is where we get the objections to doing that. And 
it sure sounds bad:

===================

"If you change these, you are on your own and strange things will
start to happen. The default encoding does not only affect
the translation between Python and the outside world, but also
all internal conversions between 8-bit strings and Unicode.

Hacks like [this] are just
downright wrong and will cause serious problems since Unicode
objects cache their default encoded representation."

"The key problem is that objects that compare equal should also hash
equal. String and Unicode hashing has been constructed so that byte
strings hash the same as if interpreted as latin-1. If, say, utf-8
would be the system encoding, then, for some values of S,

S == unicode(S) and hash(S) != hash(unicode(S))

That, in turn, *will* break dictionaries."

====================

But then - none of this seems unique to the live image case. On 
non-lives, we *already use exactly the hack they say is so terrible* - 
it's the whole reason we have pyanaconda/sitecustomize.py:

import sys
# pylint: disable=no-member
sys.setdefaultencoding('utf-8')

I may be missing something, but so far as I can see, while we would have 
to implement the hack slightly differently in the live case, the 
different implementation isn't any *more* dangerous than the one we're 
already using in the non-live case. The only thing different about the 
live case is the use of reload(sys) vs. using the site-customize trick, 
and so far as I can see, none of the objections to this hack are about 
the use of reload(sys), they're about the use of 
sys.setdefaultencoding().

If I'm wrong about that, do enlighten me :)

Otherwise, though, what exactly do we have to lose? I'm happy with the 
idea that it's the wrong thing to do. We do lots of wrong things. Some 
days I do 30 wrong things before breakfast. If the only other 
alternative is poking through the entire installer trying to trigger 
every goddamn translated string to find all the broken cases, let's do 
something wrong.
--
Adam Williamson
Fedora QA Community Monkey
IRC: adamw | Twitter: AdamW_Fedora | XMPP: adamw AT happyassassin . net
http://www.happyassassin.net

_______________________________________________
Anaconda-devel-list mailing list
Anaconda-devel-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/anaconda-devel-list