Re: invalid byte sequence for encoding "UTF8": 0xf481 - how could this happen?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



=>SELECT convert_to(content, 'UTF8') FROM tmp_article;
This works. My pg is at latest 9.1.3 on ubuntu 10.04 server. We have millions of data in the db but this is the only one we met the problem. The bad data is inserted in recent days and we upgraded to 9.1.3 right after it was released.

于 2012/4/16 16:31, Albe Laurenz 写道:
Rural Hunter wrote:
My db is in utf-8, I have a row in my table say tmp_article and I
wanted
to generate ts_vector from the article content:
select to_tsvector(content) from tmp_article;
But I got this error:
ERROR:  invalid byte sequence for encoding "UTF8": 0xf481

I am wondering how this could happen. I think if there was invalid
UTF8
bytes in the content, it shouldn't have been able to inserted into the
tmp_article table as I sometimes see similar errors when inserting
records to tmp_article. Am I right?
You are right in theory.  A lot depends on your PostgreSQL version,
because
the efforts to prevent invalid strings from entering the database have
led to changes over the versions.  Older versions are more permissive.

To test the theory that the contents of the table are bad, you can
test if the same happens if you

SELECT convert_to(content, 'UTF8') FROM tmp_article;

Yours,
Laurenz Albe



--
Sent via pgsql-admin mailing list (pgsql-admin@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux