On 20 March 2013 23:35, David Woodhouse <dwmw2@xxxxxxxxxxxxx> wrote: > On Tue, 2013-03-19 at 12:24 +0100, Nicolas Mailhot wrote: >> >> Le Mar 19 mars 2013 11:38, Ian Malone a écrit : >> >> > and holding up the release for what is basically a triviality seems a >> > bit silly. >> >> The perception correct UTF-8 handling is a triviality that should be >> worked on at some later date is the reason we have this breakage now. > > No. As I understand it, this bug would have happened if we were still in > the 20th century and using the legacy 8-bit encodings too. > > We have an 'is it text?' function which arbitrarily allows 2% of bytes > to be >= 0x80. Which means that even in ISO8859-1, a file containing > just the words "Schrödinger's Cat" wouldn't be considered to be text. > > It's just broken; it's not even UTF-8 specific. In fact, UTF-8 makes > things *easier* because you can check for valid UTF-8 byte sequences > instead of just bytes >= 0x80. > I had to read that twice before my brain would accept its meaning. There's a smiley on a forum I use that would be appropriate, but :eek: isn't quite so good without the picture. -- imalone http://ibmalone.blogspot.co.uk -- devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/devel