Search Postgresql Archives

Re: How to find freak UTF-8 character?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Its simple to remove strange chars  with regex_replace.

2011/10/1, Leif Biberg Kristensen <leif@xxxxxxxxxxxxxx>:
> On Saturday 1. October 2011 21.29.45 Andrew Sullivan wrote:
>> I see you found it, but note that it's _not_ a spurious UTF-8
>> character: it's a right-to-left mark, ans is a perfectly ok UTF-8 code
>> point.
>
> Andrew,
> thank you for your reply. Yes I know that this is a perfectly legal UTF-8
> character. It crept into my database as a result of a copy-and-paste job
> from
> a web site. The point is that it doesn't have a counterpart in ISO-8859-1 to
> which I regularly have to export the data.
>
> The offending character came from this URL:
> <http://www.soge.kviteseid.no/individual.php?pid=I2914&ged=Kviteseid.GED&tab=0>
>
> and the text that I copied and pasted from the page looks like this in the
> source code:
>
> Aslaug Steinarsdotter Fjågesund&nbsp;&nbsp;&lrm;(I2914)&lrm;
>
> I'm going to write to the webmaster of the site and ask why that character,
> represented in the HTML as the &lrm; entity, has to appear in a Norwegian
> web
> site which never should have to display text in anything but left-to-right
> order.
>
>> If you need a subset of the UTF-8 character set, you want to make sure
>> you have some sort of constraint in your application or your database
>> that prevents insertion of anything at all in UTF-8.  This is a need
>> people often forget when working in an internationalized setting,
>> because there's a lot of crap that comes from the client side in a
>> UTF-8 setting that might not come in other settings (like LATIN1).
>
> I don't want any constraint of that sort. I'm perfectly happy with UTF-8.
> And
> now that I've found out how to spot problematic characters that will crash
> my
> export script, it's really not an issue anymore. The character didn't print
> neither in psql nor in my PHP frontend, so I just removed the problematic
> text
> and re-entered it by hand. Problem solved.
>
> But thank you for the idea, I think that I will strip out at least any &lrm;
> entities from text entered into the database.
>
> By the way, is there a setting in psql that will output unprintable
> characters
> as question marks or something?
>
> regards, Leif.
>
> --
> Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>


-- 
------------
pasman

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux