Instead of adding an FAQ entry, which might not be found when the error is generated, I added a HINT for 8.2 that will appear with the error message: errmsg("invalid byte sequence for encoding \"%s\": 0x%s", pg_enc2name_tbl[encoding].name, buf), errhint("This failure can also happen if the byte sequence does not " "match the encoding expected by the server, which is controlled " "by \"client_encoding\"."))); Supplying information at the point of error is usually the best solution, if possible. Backpatched to 8.1.X as well. --------------------------------------------------------------------------- Tim Allen wrote: > Matthew T. O'Connor wrote: > > Well, to answer my own question, I hacked the source code of DBMail and > > had it set the client encoding to LATIN1 immediately after database > > connect, this seems to have fixed the problem. > > > > Sorry for the noise, > > > > Matt > > I've seen this sort of problem asked about in the mailing lists often > enough to think it merits a FAQ entry, so how about this text: > > <entry> > Q. Why do I have problems inserting text into my database, with error > messages like > > ERROR: invalid byte sequence for encoding "UTF8": 0xe1202c ? > > A. Almost certainly that byte sequence really is an invalid byte > sequence for that encoding. The reason you are seeing the error is > probably because you are providing text in some other encoding. You and > the database need to agree between you what encoding you're using. > PostgreSQL is fairly good at working with you, converting to and from > whatever encoding you want to use, but you need to tell it what that > encoding is, and then stick to that encoding consistently. > > If you don't set the client encoding, then PostgreSQL will use the > default encoding for the database, which in modern times is often UTF8 > (aka UNICODE), and is set at database creation time. However, many > client apps still use other encodings, (eg Latin1, aka ISO-8859-1), so > you need to either educate the client app to use UTF8, or get it to > inform PostgreSQL what other encoding to use. > > The way to tell PostgreSQL what encoding you want to use is by use of > the client_encoding GUC variable, eg > > set client_encoding to 'LATIN1'; > > One reason you may be seeing this problem now, after upgrading your > version of PostgreSQL, is that recent versions have tighter validation > of encoded text. Previously you may not have been conscious of what > encoding you were actually using, especially if you're a speaker of a > Western European language, and may have gotten away with writing > incorrectly-encoded text without the database complaining. Now is the > time to start getting it right. > > One thing to be wary of is the "SQL_ASCII" encoding. It appears to be > commonly and incorrectly believed that this represents either some > variant on latin1, or pure 7-bit ASCII. It is neither of those, but a > completely unchecked encoding that really means whatever you want it to > mean. This makes it not a very good encoding to use in practice, as it > becomes prone to allowing a mixture of different encodings to be present > in the same set of data, which will cause you headaches when you try to > convert the whole lot to some consistent encoding in the future. > > See section 21.2 of the documentation for more complete information. > </entry> > > Tim > > -- > ----------------------------------------------- > Tim Allen tim@xxxxxxxxxxxxxxxx > Proximity Pty Ltd http://www.proximity.com.au/ > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq -- Bruce Momjian bruce@xxxxxxxxxx EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +