Search Postgresql Archives

Re: psql weird behaviour with charset encodings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hernan gonzalez <hgonzalez@xxxxxxxxx> writes:
> The issue is that psql tries (apparently) to convert to UTF8
> (even when he plans to output the raw text -LATIN9 in this case)
> just for computing the lenght of the field, to build the table.
> And because for this computation he (apparently) rely on the string
> routines with it's own locale, instead of the DB or client encoding.

I didn't believe this, since I know perfectly well that the formatting
code doesn't rely on any OS-supplied width calculations.  But when I
tested it out, I found I could reproduce Hernan's problem on Fedora 11.
Some tracing showed that the problem is here:

                fprintf(fout, "%.*s", bytes_to_output,
                        this_line->ptr + bytes_output[j]);

As the variable name indicates, psql has carefully calculated the number
of *bytes* it wants to print.  However, it appears that glibc's printf
code interprets the parameter as the number of *characters* to print,
and to determine what's a character it assumes the string is in the
environment LC_CTYPE's encoding.  I haven't dug into the glibc code to
check, but it's presumably barfing because the string isn't valid
according to UTF8 encoding, and then failing to print anything.

It appears to me that this behavior violates the Single Unix Spec,
which says very clearly that the count is a count of bytes:
http://www.opengroup.org/onlinepubs/007908799/xsh/fprintf.html
However, I'm quite sure that our chances of persuading the glibc boys
that this is a bad idea are zero.  I think we're going to have to
change the code to not rely on %.*s here.  Even without the charset
mismatch in Hernan's example, we'd be printing the wrong amount of
data anytime the LC_CTYPE charset is multibyte.  (IOW, the code should
do the wrong thing with forced-line-wrap cases if LC_CTYPE is UTF8,
even if client_encoding is too; anybody want to check?)

The above coding is new in 8.4, but it's probably not the only use of
%.*s --- we had better go looking for other trouble spots, too.

			regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux