Search Postgresql Archives

bytea and character encoding when inserting escaped literals

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I've searched the archives a fair amount on this topic, but have not found quite the answer / explanation I'm looking for. I attribute this to my eternal confusion over character encoding issues in all environments, so I apologize in advance for what might be a stupid question. :)

I'mm running Postgresql 8.3.1 on WinXP. I have a UTF8 database into which I'm trying to execute a series of INSERT INTO DDL statements. One of the columns in the table I'm inserting into is a BYTEA column, intended to hold the bytes that are the representation of a (small) image.[1]

I had thought -- apparently erroneously -- that because this is not a text based column, that I could send any string of bytes (octets) via my INSERT statement to populate values in this column. I'm using escaped string literals with hexadecimal representation so my INSERTs look something like:

INSERT INTO myTable VALUES (..., E'\x15\x1C\x2F\x00\x02...', ...) ;

As you might be able to guess, I'm getting the error:

  ERROR: Invalid byte sequence for encoding "UTF8": 0x00

(I get the error whether I attempt this via JDBC or via the command-line client with client encoding set to UTF8 or WIN1252.)

Again, I was surprised by this error since I thought from the documentation at [2] that the server would only expect to be dealing in a sequence of octets here, without any character-encoding constraints implied by the DB's encoding.

What is the actual cause of this error, and how do I workaround it? Do I need to pretend that my data is Unicode character data and specify the UTF8 octets for that character data in my E'...' literal?

thanks in advance for any help!

Lee

PS [3]

[1] Actually, this DDL has been converted from that for a different DB that uses LONGVARBINARY for this. BYTEA was my best guess for the Postgresql equivalent.

[2] http://www.postgresql.org/docs/8.3/interactive/datatype-binary.html

[3] I also was confused as to why 0x00 would be an invalid UTF8 byte sequence. On its own, as I understand it, 0x00 is a fine UTF8 byte sequence (representing Unicode codepoint 0). And when I (from the command line) try to insert other invalid UTF8 sequences -- such as INSERT INTO foo VALUES (E'\xC0\x80') I get an error that mentions the full byte sequence as invalid: "invalid byte sequence for encoding "UTF8": 0xc080". So this further confuses me. :-)


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux