Re: Re: TODO: Expose parser support for decoding unicode escape literals to user

Adrian Klaver <adrian.klaver@xxxxxxxxxxx> · Thu, 15 May 2014 07:41:45 -0700

On 05/15/2014 07:13 AM, David G Johnston wrote:
Adrian Klaver-4 wrote
On 05/15/2014 01:31 AM, Craig Ringer wrote:
Hi all

I just noticed a Stack Overflow question
(http://stackoverflow.com/q/20124393/398670) where someone's asking how
to decode '\u0000` style escapes *stored in database text fields* into
properly encoded text strings.

The parser supports this for escape-strings, and you can write E'\u011B'
to get 'ě' because of
http://postgresql.1045698.n5.nabble.com/Unicode-escapes-in-literals-td1992313.html.

I don't see this exposed in a way that users can call directly, though.
'decode(bytea, text)' has the 'escape' input, but it expects octal.

It's possible to use PL/PgSQL's 'EXECUTE' to use the parser to do the
work, but that's downright awful.

Am I missing something obvious, or is this something that'd be a good
new-developer TODO?

Not sure if this is what you want?:

test=> SELECT quote_literal(E'test \u011B');
   quote_literal
---------------
   'test ě'

Except the data is already in the database and there is no way to put an "E"
in front of a column name and cause PostgreSQL to process the escapes
embedded in the column's value in the same way it processes a literal.

Yea, that is a problem.

WITH src (txt) AS ( VALUES ('A \u011B C') )
SELECT txt FROM src;

Hence the need for a function to perform the same process that the parser
performs when dealing with literals.

David J.

--
Adrian Klaver
adrian.klaver@xxxxxxxxxxx