Hi Adrian: On Wed, Nov 25, 2015 at 9:33 PM, Adrian Klaver <adrian.klaver@xxxxxxxxxxx> wrote: > I will grant you that working with encodings is like working with > timestamps, explicit is better. The thing I am having a problem with is how > not knowing the context of the bytea value is different in the implicit vs > explicit case: > Granted the above is contrived and bound to fail, but the point is you need > to know what created the bytea however it got there. Now if you are in > charge of both ends of the process, then the above is your own fault. > Otherwise, you are down to detective work on what encoding was used whether > it was implicit or explicit. As the OP was working in a single context I am > not seeing the issue in making use of that context to do the heavy lifting. > For the use cases that you show I agree that a defined > convert_to/encode/decode/convert_from chain is a best practice and something > I had not really thought out, so thanks. Well, I tend to think as a programmer. So I was thinking of the scenario where you want to get some text from the database and move it around, and you control all the code. What I was trying to point is that using explicit all the character sets that matter are in the database, so I can move it around freely, as I'm the one defining the queries, while in the implicit case I have to know, or get from the catalogs, the encoding of the database. I do not know what context the OP was working and wanted to point he was mixing types. Postgres has a lot of them, specially to/from text, and I've found the hard way that lots of implicit conversions are great for one shot programs or interactive tests, but relying on implicit type conversions for real production code, put in a source, causes a lot of problems. In a single session case you can even use implicit conversion + encode and then paste the result into a convert_from adding quotes and x and it's going to work, but if you write down that in code you are going to be confused if something fails later, things like: cdrs=> select encode('Año','hex'); encode ---------- 41c3b16f cdrs=> select convert_from('\x41c3b16f', 'UTF-8'); convert_from -------------- Año (1 row) seem like you are converting back and forth, but then: cdrs=> select convert_from(encode('Año','hex'), 'UTF-8'); ERROR: function convert_from(text, unknown) does not exist LINE 1: select convert_from(encode('Año','hex'), 'UTF-8'); And also, the encode() example will give different results depending on database encoding. Using explicit charsets and correct types frees me from it. After all, to make the full round trip a covert-to+encode+decode+convert-from is needed, and making it explciti makes things easier. After all, not everybody knows that all values in the database pass through a conversion to/from text to be sent to / received from psql, and that you can use it if you just want the hex digits in the current database encoding: cdrs=> select 'Año'::bytea; bytea ------------ \x41c3b16f (1 row) But I doubt using cast instead of encode can be recomended. And for the heavy lifting, you are just saving some keystrokes, which IMO is a cheap price to pay for having a nearly self documenting conversion chain. Anyway, I think people should be told to respect types, and people should be teach that strings are sequences of characters, that to do what people think is 'hex encoding' ( two digits per *byte*, no delimiter ), you need to first transform the string to bytes, then hex-encode that. Nearly all the encoding problems I see is because people thing strings are sequences of bytes, which they ceased to be when multibyte encodings where detected ( and even without them, in Java and I think some of the windows NT API Strings where sequences of 16 bits thingos, first UCS-2. then UTF16. Once people stop trying to encode/decode strings directly normally they problems vanish. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general