Re: SELECT d02name::bytea FROM ... && DBI::Pg

Francisco Olarte <folarte@xxxxxxxxxxxxxx> · Sat, 12 Oct 2019 13:33:59 +0200

Matthias:

On Thu, Oct 10, 2019 at 7:26 PM Matthias Apitz <guru@xxxxxxxxxxx> wrote:
> Hmm. But *I* do need the content in hex to see if the varchar column
> contains correct encoded UTF-8 data. We're on the way to port a huge
> database application from Sybase to PostgreSQL and are facing any kind of
> problems one can think of. Magically, sometimes strings, expected to be
> coded in UTF-8, arrive in the Perl $variables coded in ISO-8859-1 and than cause
> other problems when German Umlauts should be translated into HTML
> encodings like &uuml; etc. to be presented in the web browser.

This seems to be a perl problem ( I've had my share of them ). I
suppose you can convince perl to upgrade all your vas to utf-8, but I
do not remember how.

Anyway, if you want "the hex representation of the bytea equivalent of
a field ( of which I do not rememeber the original type ), why don't
you ask for it?

If you ask for bytea you are asking for a binary string. The fact it
is transmitted hex-encoded in the wire is an implementation detail.

psql is a text mode interface. To represent binary strings to you it
needs text, so it uses hex.

perl can do binary, so it puts the content in binary. After all your
bytea could have been a jpeg image for all perl/DBI knows.

But you can easily tell perl to pass a binary to hex, with unpack.

OTOH, if you want, you can tell postgres to send you hex, look for
encode/decode in the relevant manual pages, they are under binary
string functions, IIRC. The pg will build a text hex string, send it
on the wire by whatever mechanism it chooses and you'll get the hex
data from DBI. Do not ask pg+DBI for binary ( ::bytea ) and expect
text.

> Perl (and Java) sucks, it does magic things below the surface of
> string (objects). That's why I like C :-)

They have differing degrees of suckines. I've read the Java String
sources, and been horrified by them. Perl, OTOH, sucks for many
things, but has its points. And C, I try to use C++ for everything I
can ( many times using it as just a better ( for me ), C compiler,
I've been known for writting huge chunks of C++ but use malloc/stdio
and friends all around, but I've found programs have way less bugs if
using thighter types ). It's a taste question, and of course, I
wouldn't like to do my hundreds of <100 liners in perl for parsing
huge texts and extracting some data in C. But utf-8/latin-1, you're
right, you have umlauts, we have accendts and n-tildes, I've been
there and will be again. Enough off-topic anyway.

Francisco Olarte.