On 2021-04-26 07:45:26 -0500, Ron wrote: > On 4/26/21 7:32 AM, Peter J. Holzer wrote: > > On 2021-04-26 06:49:18 -0500, Ron wrote: > > > The destination is an (RDS) Postgresql 12.5 with encoding UTF8, and is being > > > loaded through COPY commands generated by ora2pg. > > > > > > The source table has a BLOB column (I think they are scanned images) which > > > I'm loading into a Postgresql bytea column. > > > > > > Seven times out of about 60M rows, I get this error: > > > Psql:909242: ERROR: invalid byte sequence for encoding "UTF8": 0xed 0xaf 0xbf > > Decoding UTF8 doesn't make sense for a bytea column. How does that data > > look like in the file generated by ora2pg? > > I thought it was weird, too, but COPY has to read text, no? Yes, but data for a bytea column would normally be encoded in hex or something like that ... > COPY mv_response_attachment_old (response_attachement_id,binary_data,employer_response_id,attachment_id_code,file_type,attachment_desc,attachment_size,file_name,partition_date,prior_incident_id,part_date) > FROM STDIN; > 1583201 \\x255044462d312e330d25e2e3cfd30d0a31362030206f... ... Yes, like this. There are only hex digits (plus \ and x) in the column, nothing which would require decoding UTF-8. My guess is that the error is actually in the data for another column. I'd try to identify the broken records and check whether they contain some other strange content. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@xxxxxx | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment:
signature.asc
Description: PGP signature