Its simple to remove strange chars with regex_replace. 2011/10/1, Leif Biberg Kristensen <leif@xxxxxxxxxxxxxx>: > On Saturday 1. October 2011 21.29.45 Andrew Sullivan wrote: >> I see you found it, but note that it's _not_ a spurious UTF-8 >> character: it's a right-to-left mark, ans is a perfectly ok UTF-8 code >> point. > > Andrew, > thank you for your reply. Yes I know that this is a perfectly legal UTF-8 > character. It crept into my database as a result of a copy-and-paste job > from > a web site. The point is that it doesn't have a counterpart in ISO-8859-1 to > which I regularly have to export the data. > > The offending character came from this URL: > <http://www.soge.kviteseid.no/individual.php?pid=I2914&ged=Kviteseid.GED&tab=0> > > and the text that I copied and pasted from the page looks like this in the > source code: > > Aslaug Steinarsdotter Fjågesund ‎(I2914)‎ > > I'm going to write to the webmaster of the site and ask why that character, > represented in the HTML as the ‎ entity, has to appear in a Norwegian > web > site which never should have to display text in anything but left-to-right > order. > >> If you need a subset of the UTF-8 character set, you want to make sure >> you have some sort of constraint in your application or your database >> that prevents insertion of anything at all in UTF-8. This is a need >> people often forget when working in an internationalized setting, >> because there's a lot of crap that comes from the client side in a >> UTF-8 setting that might not come in other settings (like LATIN1). > > I don't want any constraint of that sort. I'm perfectly happy with UTF-8. > And > now that I've found out how to spot problematic characters that will crash > my > export script, it's really not an issue anymore. The character didn't print > neither in psql nor in my PHP frontend, so I just removed the problematic > text > and re-entered it by hand. Problem solved. > > But thank you for the idea, I think that I will strip out at least any ‎ > entities from text entered into the database. > > By the way, is there a setting in psql that will output unprintable > characters > as question marks or something? > > regards, Leif. > > -- > Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- ------------ pasman -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general