On 11/25/2015 10:49 AM, Francisco Olarte wrote:
Mail pingpong day. ;-)
On Wed, Nov 25, 2015 at 7:27 PM, Adrian Klaver
<adrian.klaver@xxxxxxxxxxx> wrote:
postgres@latin1_db=# \encoding
UTF8
This does not matter, as you are sending/receiving hex data, and the
encoding done when sending query results to you gets reverted when you
send queries back to the server.
postgres@latin1_db=# select convert_from('\xe9', 'latin1');
convert_from
--------------
é
(1 row)
This DOES matter, you've had to change the query for it to work, you
had to look up ( or know beforehand ) the database encoding and change
it accordingly. . I mean, when you do encode('text',....) what you are
really asking for is encode( implicit_text_to_bytea_conversion(text),
....), so you save one step, but them you have to change your query to
the 'latin1' encoding you discovered. This is what I meant, you had to
look at the database properties. But if you do an explicit
convert_form with any encoding capable of representing all your data,
like utf-8 which can represent anything, the database encoding does
not matter. And it should not, proper code should work with any
database encoding. Even more, I can do encode(convert_to(utf8)) in a
latin1 database conecting with any encoding, then send the hex to
convert_from(decode(),utf8) to an ebcdic database use another encoding
( Of course I may need to transcode the hex, but not to the ebcdic,
but to the client encoding been used in teh second case ), ant it
would work as long as all the characters In the source data are
representable in the destination database ( if they are not you are
out luck in any scheme you may think off ).
I will grant you that working with encodings is like working with
timestamps, explicit is better. The thing I am having a problem with is
how not knowing the context of the bytea value is different in the
implicit vs explicit case:
create table hex_test(hex_fld bytea);
aklaver@latin1_db=> select convert_to('é', 'utf8');
convert_to
------------
\xc3a9
(1 row)
aklaver@latin1_db=> insert into hex_test values ('\xc3a9');
aklaver@latin1_db=> select encode('é', 'hex');
encode
--------
e9
(1 row)
aklaver@latin1_db=> select decode('e9', 'hex');
decode
--------
\xe9
(1 row)
aklaver@latin1_db=> insert into hex_test values ('\xe9');
aklaver@latin1_db=> select * from hex_test ;
hex_fld
---------
\xc3a9
\xe9
(2 rows)
aklaver@latin1_db=> select convert_from(hex_fld, 'latin1') from hex_test ;
convert_from
--------------
é
é
(2 rows)
aklaver@latin1_db=> select convert_from(hex_fld, 'utf8') from hex_test ;
ERROR: invalid byte sequence for encoding "UTF8": 0xe9
Granted the above is contrived and bound to fail, but the point is you
need to know what created the bytea however it got there. Now if you are
in charge of both ends of the process, then the above is your own fault.
Otherwise, you are down to detective work on what encoding was used
whether it was implicit or explicit. As the OP was working in a single
context I am not seeing the issue in making use of that context to do
the heavy lifting. For the use cases that you show I agree that a
defined convert_to/encode/decode/convert_from chain is a best practice
and something I had not really thought out, so thanks.
You cannot encode generically an string to hex unless you define an
encoding. How do you encode '1', "31" or "F1" ? or maybe "0031" or
"3100"? You can do a quick hack, but normally what you want is first
to encode a sequence of characters to a sequence of bytes and then
hex-encode that, as nearly everybody uses the same conversion for
hex-encoding a byte sequence. This means you can have a '0' in a
ebcdic database, transform it to to [0x30] byte array, encode this as
"30" and then transform the later to 00 30 00 10 because you are using
UTF16-BE wire encoding. Encoding is tricky enough without relying on
implicit convertion or on a character being the same as a byte.
Francisco Olarte.
--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general