Search Postgresql Archives

Re: COPY 'invalid byte sequence for encoding "UTF8": 0xff'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Chris Worley wrote:
...
For example, if I create a
table with a single bytea column called "foo" and insert the a record with
value "^F^O^O" then dump it, the dump will have the following:

COPY byteatest (foo) FROM stdin;
\\006\\017\\017
\.

How does pg_dump and pg_restonre handle everything with no errors?
See above. It escapes the data so it can be represented in a query. The Ctrl-F in the bytea column is converted for insertion purposes to its 3-digit octal equivalent of 006 represented as single escaped character in the data as \\006. The Ctrl-O is \\017. Both \\006 and \\017 are single characters. You _can_ escape all characters - ie instead of 'A' you could use '\\101' but you are only _required_ to escape those certain characters listed in the documentation. Again, see http://www.postgresql.org/docs/8.3/static/datatype-binary.html.
The column is basically telnet buffers which contains escape sequences
so i have bytes such as 0xff, 0x1b...   Piping the output through
iconv helped formatting some of the data, but it appears I am still
left with some, such as 0xff.
This makes no sense. Binary data could be an executable, a video, a photo, audio data, etc. but it is not linguistic characters where a character-set interpretation is appropriate. You may pipe it through iconv and find there are fewer errors in the attempted import but that does not imply that output is in any way improved by the transformation. (I can also feed sox to a JPEG photo, tell it that the data is mu-law, and have it create an MP3 but the result would just be painful noise.)
It would be nice to know what sql pg_dump uses to create the contents
inside of the copy command.  Any idea what that is?

Yes, it escapes those bytea characters that require escaping and leaves the others alone. Here is the same example from before with the string "^F^O^Obar" (foo in control-characters, bar in lowercase) in the bytea column:

COPY byteatest (foo) FROM stdin;
\\006\\017\\017bar
\.

You need to do the same in your program. Most client interfaces have an escape-bytea function that will handle it for you. In C you could use PQescapeByteaConn while PHP uses pg_escape_bytea. Find the appropriate function for your language.

Cheers,
Steve


--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux