Search Postgresql Archives

FInding "corrupt" values in UTF-8 tables (regexp question, I think)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm noticing that some of my data has been imported as junk text:

For instance:

    klciã«"

What would be the SQL to find data of this nature? My column can only
have alphanumeric data, and the only symbols allowed are "-" and "_",
so I tried this regexp query:

    select id, t_code
    from traders
    where t_code ~ '[^A-Za-z1-9\-]'
    limit 100;

But this starts to return values such as "181xn-807199" which is valid
as per the above regexp? Also, when I try to include the underscore,
as follows...

    select id, t_code
    from traders
    where t_code ~ '[^A-Za-z1-9\-\_]'
    limit 100;

This gives me an error: "ERROR:  invalid regular expression: invalid
character range".

What am I missing? Does this have something to do with erroneous
encodings? I want my data to be utf-8 but I do want to find it with
latin1 queries when the text in columns is supposed to be only latin1
characters! Or is "a-z" in utf-8 considered different from "a-z" in
latin1?

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux