Marti Raudsepp schrieb:
On Fri, Dec 2, 2011 at 16:16, Torsten Zuehlsdorff
<foo@xxxxxxxxxxxxxxxxxxx> wrote:
But i clearly have a missunderstanding of other chars, like umlauts or utf-8
chars. This, for example, should return a 'ö':
# SELECT chr(x'C3B6'::int);
chr
-----
쎶
(1 row)
That gives you the Unicode codepoint C3B6, but %C3%B6 is UTF-8-encoded
and actually decodes to the codepoint 00F6.
There is a fundamental problem that a decoded URL may actually be a
binary string -- it might not have a textual representation at all.
But if text is what you want, RFC3986 strongly suggests using UTF-8
for encoding text strings in URLs, and that works almost always in the
real world.
Text is what i want. :) I've created a highly specialiced CMS, which
handle a bunch of big sites (in meaning of a great numbers of users and
content). It has a build-in traffic-analyze and with this function it
creates a real time analyze of the keywords, a user used to find the
sites in search engines. This is very needful if you try to do SEO for
websites with more than 20.000 unique content-pages. :)
CREATE OR REPLACE FUNCTION url_decode(input text) RETURNS text
LANGUAGE plpgsql IMMUTABLE STRICT AS $$
DECLARE
bin bytea = '';
byte text;
BEGIN
FOR byte IN (select (regexp_matches(input, '(%..|.)', 'g'))[1]) LOOP
IF length(byte) = 3 THEN
bin = bin || decode(substring(byte, 2, 2), 'hex');
ELSE
bin = bin || byte::bytea;
END IF;
END LOOP;
RETURN convert_from(bin, 'utf8');
END
$$;
Hey, this function looks similar to my encoding function :) Thank you
very munch!
This will break for binary-encoded data in URLs, though.
Thats no problem, i just have text.
Big thanks to all of you,
Torsten
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general