On Fri, Dec 2, 2011 at 16:16, Torsten Zuehlsdorff <foo@xxxxxxxxxxxxxxxxxxx> wrote: > But i clearly have a missunderstanding of other chars, like umlauts or utf-8 > chars. This, for example, should return a 'ö': > > # SELECT chr(x'C3B6'::int); > chr > ----- > 쎶 > (1 row) That gives you the Unicode codepoint C3B6, but %C3%B6 is UTF-8-encoded and actually decodes to the codepoint 00F6. There is a fundamental problem that a decoded URL may actually be a binary string -- it might not have a textual representation at all. But if text is what you want, RFC3986 strongly suggests using UTF-8 for encoding text strings in URLs, and that works almost always in the real world. So the *right* way is to first convert the URL to a binary "bytea" type by fixing all the % escapes, then convert that to UTF-8 encoding to handle multibyte characters. What I came up with is far from elegant because PostgreSQL lacks convenient functions for bytea manipulation (no bytea_agg, etc). Stealing a little from Merlin, this is what it looks like: CREATE OR REPLACE FUNCTION url_decode(input text) RETURNS text LANGUAGE plpgsql IMMUTABLE STRICT AS $$ DECLARE bin bytea = ''; byte text; BEGIN FOR byte IN (select (regexp_matches(input, '(%..|.)', 'g'))[1]) LOOP IF length(byte) = 3 THEN bin = bin || decode(substring(byte, 2, 2), 'hex'); ELSE bin = bin || byte::bytea; END IF; END LOOP; RETURN convert_from(bin, 'utf8'); END $$; db=# select url_decode('Hell%C3%B6%20World%21'); url_decode -------------- Hellö World! db=# select url_decode('%EC%8E%B6'); url_decode ------------ 쎶 This will break for binary-encoded data in URLs, though. db=# select url_decode('%fa%fa%fa'); ERROR: invalid byte sequence for encoding "UTF8": 0xfa CONTEXT: PL/pgSQL function "url_decode" line 13 at RETURN ---- On Fri, Dec 2, 2011 at 17:46, Merlin Moncure <mmoncure@xxxxxxxxx> wrote: > set client_encoding to latin1; > postgres=# select unencode('Hell%C3%B6%20World%21'); > unencode > --------------- > Hellö World! > (1 row) Sorry, but AFAICT this makes a mess of encodings and only works by pure luck. The server thinks it's sending the client LATIN1 text, but it's actually UTF8-encoded and the last decoding step is done by your terminal. Regards, Marti -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general