> Tatsuo Ishii wrote: > > Sent: Sunday, May 08, 2005 12:01 PM > > To: linux@xxxxxxxxxxx > > Cc: pgsql-general@xxxxxxxxxxxxxx; pgsql-hackers@xxxxxxxxxxxxxx > > Subject: Re: [HACKERS] Invalid unicode in COPY problem > > > > We have developed patches which relaxes the character > > validation so that PostgreSQL accepts invalid characters. It > > works like this: > > That is just plain 100% wrong!! > > Under no circumstances should there be invalid data in a database. > And if you're trying to make a database of invalid data, then at > least encode it using a valid encoding. > > In fact, I've proposed strengthening the validation routines for UTF-8. Actually I myself thought as you are before. Later I found that it was not so good idea. People already have invalid encoded data in their precious database and have very hard time to migrate to newer version of PostgreSQL because of encoding validation. Think about this kind of situation: There is a table t1(member_id integer primary key, member_name text, address text, phone text, email text). I have to reach each member by either adress, phone or email. Unfortunately some of address field have wrong encoded data. In this case I will use phone or email to reach them. Now I need to upgrade to newer PostgreSQL within 1 day. I know I have to fix wrong encoded field but it will take more than 1 day. So I would like to import the data first then fix wrong encoded field on running database since I can reach members by phone or email even with wrong encoded address field... I saw this kind of situation in the real world and that's why we developed the patches. -- Tatsuo Ishii ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend