On Thu, Jun 26, 2008 at 03:31:01PM +0200, Albe Laurenz wrote: > Michael Fuhr wrote: > > Your input data seems to have a mix of encodings: sometimes you're > > getting pound signs in a non-UTF-8 encoding, but if characters like > > <U+2019 RIGHT SINGLE QUOTATION MARK> got into the database when > > client_encoding was set to UTF8 then at least some data must have > > been in UTF-8. > > Sorry, but that's not true. > That character is 0x9s in WINDOWS-1252. I think you mean 0x92. > So it could have been that client_encoding was (correctly) set to WIN1252 > and the quotation mark was entered as a single byte character. Yes, *if* client_encoding was set to win1252. However, in the following thread Garry said that he was getting encoding errors when entering the pound sign that were resolved by changing client_encoding (I suggested latin1, latin9, or win1252; he doesn't say which he used): http://archives.postgresql.org/pgsql-general/2008-06/msg00526.php If client_encoding had been set to win1252 then Garry wouldn't have gotten encoding errors when entering the pound sign because that character is 0xa3 in win1252 (also in latin1 and latin9). So either applications are setting client_encoding to different values, sometimes correctly and sometimes incorrectly (Garry, do you know if that could be happening?), or the data is sometimes in different encodings. If the data is being entered via a web application then the latter seems more likely, at least in my experience (I've had to deal with exactly this problem recently). -- Michael Fuhr