Re: PostgreSQL JDBC Driver ERROR: invalid byte sequence for encoding "UTF8": 0x00 on Select stmt

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Samed, 

Thanks for the detailed response. I still have few questions that I would like to check my understanding on. 

First let me provide some additional background information. 

Turning to the  application and the errors on the select statements. 

Thanks again for any additional insight provided. 

Regards, 

Craig Jackson

On Aug 8, 2020, 5:58 AM -0600, Samed YILDIRIM <samed@xxxxxxxxxx>, wrote:
Hi,
 
First of all, 0x00 character is not a valid UTF-8 character. And, PostgreSQL is more strict compare to Oracle about character encodings.
select count(*) from <table_name> where instr(<table_cole>, unistr('\0000')) > 0;
If your character encoding of your Postgres database is UTF-8 and data type of the column is a type for storing character, such as varchar, text etc., answer is no. There is no equivalent in Postgres of the query above. Reason is simple, you can't store character 0x00 by the reason that it is not a valid character string :)
 
Most probably, while copying of data from Oracle to Postgres, oracle_fdw or oracle driver it used skipped character 0x00. So, your data in PostgreSQL is clean in the aspect of character 0x00.
 
The question is what is the source of the errors. Answer is that the application itself is the source of the errors. Your application sends invalid UTF-8 strings to database. Also the application is responsible to invalid UTF-8 strings stored in Oracle database. You should check and fix your application.  
Best regards.
Samed YILDIRIM
 
08.08.2020, 12:59, "Hotmail" <crajac66@xxxxxxxxxxx>:
Hi All, 

We migrated data from Oracle to postgres using oracle_fdw without seeing any invalid byte sequence for encoding "UTF8": 0x00 errors. The data transfer from Oracle to Postgres using oracle_fdw completed successfully without any errors.

After we started our application using Postgres as the new data source we are consistently seeing "invalid byte sequence for encoding "UTF8": 0x00 errors" once or twice a minute on a basic select statement like this:

Select * from <table> where <col1>=‘foo’ and <col2>=‘foo2’;

Not all queries fail just some. We can select * from the table in question from psql without any invalid byte sequence errors (A problem with the jdbc client maybe?)


We went back to the source Oracle database and re-ran queries to try to find any invalid byte sequences and we could not find any. Again the data was copied with oracle_fdw with no errors. 

We would like to know if there are any queries we can run on the postgres table generating this error to identify the source of the invalid byte sequence errors. The query we used in the Oracle db to identify invalid byte sequences does not translate directly because it uses the instr oracle function. Here’s the oracle query:

select count(*) from <table_name> where instr(<table_cole>, unistr('\0000')) > 0;

Is there an equivalent query we could run in Postgres?

Regards,

Craig Jackson

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux