On Thu, Jul 27, 2017 at 04:00:53PM +0100, Richard W.M. Jones wrote: > On Tue, Jul 25, 2017 at 01:48:41PM +0100, Daniel P. Berrange wrote: > > Setting LC_ALL=C breaks python apps doing I/O on UTF-8 source > > files. In particular this broke glib-mkenums > > > > Traceback (most recent call last): > > File "/usr/bin/glib-mkenums", line 669, in <module> > > process_file(fname) > > File "/usr/bin/glib-mkenums", line 406, in process_file > > line = curfile.readline() > > File "/usr/lib64/python3.6/encodings/ascii.py", line 26, in decode > > return codecs.ascii_decode(input, self.errors)[0] > > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 849: ordinal not in range(128) > > > > Signed-off-by: Daniel P. Berrange <berrange@xxxxxxxxxx> > > --- > > > > Pushed to fix rawhide build > > > > maint.mk | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/maint.mk b/maint.mk > > index 79104d0..2e70cae 100644 > > --- a/maint.mk > > +++ b/maint.mk > > @@ -117,8 +117,8 @@ news-check-lines-spec ?= 1,10 > > news-check-regexp ?= '^\*.* $(VERSION_REGEXP) \($(today)\)' > > > > # Prevent programs like 'sort' from considering distinct strings to be equal. > > -# Doing it here saves us from having to set LC_ALL elsewhere in this file. > > -export LC_ALL = C > > +# Doing it here saves us from having to set LC_COLLATE elsewhere in this file. > > +export LC_COLLATE = C > > I don't know what the answer is, but two observations: > > (1) We had the same problem in libguestfs and this was our fix: > > https://github.com/libguestfs/libguestfs/commit/f861c138550a0c99247a6955aa2c594f380867f4 Hmm, unsetting LC_ALL means the output of the script is potentially affected by differing sort ordering of the user's locale, which is why i kept setting LC_COLLATE. It seems that a better approach would be to use C.UTF-8, and then fallback to en_US.UTF-8 on systems which lack it, since en_US is still pretty close to C in its semantics, while supporting UTF-8 everywhere. eg change maint.mk to be export LC_ALL = $(shell LC_ALL=C.utf-8 locale -ck charmap 2>/dev/null | \ grep -i UTF-8 1>/dev/null 2>&1 && \ echo "C.UTF-8" || echo "en_US.UTF-8") Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| _______________________________________________ virt-tools-list mailing list virt-tools-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/virt-tools-list