On Tue, Jul 21, 2009 at 6:35 PM, Sam Mason<sam@xxxxxxxxxxxxx> wrote: > On Tue, Jul 21, 2009 at 09:37:04AM +0200, Daniel Verite wrote: >> >I'd love to fix them. But if I do a search for >> >SELECT * FROM xyz WHERE col like '%0x80%' >> > >> >it doesn't work. How should I search for these characters? >> >> In 8.2, try: WHERE strpos(col, E'\x80') > 0 >> >> Note that this may find valid data as well, because the error you get >> is when 0x80 is the first byte of a character in UTF8; when it's at >> another position, you don't want to change it. > > There are various regexs around to check for valid UTF-8 encoding; one > appears to be: > > http://keithdevens.com/weblog/archive/2004/Jun/29/UTF-8.regex > > One translation into PG would be: > > WHERE NOT col ~ ( '^('|| > $$[\09\0A\0D\x20-\x7E]|$$|| -- ASCII > $$[\xC2-\xDF][\x80-\xBF]|$$|| -- non-overlong 2-byte > $$\xE0[\xA0-\xBF][\x80-\xBF]|$$|| -- excluding overlongs > $$[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}|$$|| -- straight 3-byte > $$\xED[\x80-\x9F][\x80-\xBF]|$$|| -- excluding surrogates > $$\xF0[\x90-\xBF][\x80-\xBF]{2}|$$|| -- planes 1-3 > $$[\xF1-\xF3][\x80-\xBF]{3}|$$|| -- planes 4-15 > $$\xF4[\x80-\x8F][\x80-\xBF]{2}$$|| -- plane 16 > '*)$' ); > > This seems to do the right thing for me in an SQL_ASCII database. > I tried this. Get an error. mypg=# select * from interesting WHERE NOT description ~ ( '^('|| mypg(# $$[\09\0A\0D\x20-\x7E]|$$|| -- ASCII mypg(# $$[\xC2-\xDF][\x80-\xBF]|$$|| -- non-overlong 2-byte mypg(# $$\xE0[\xA0-\xBF][\x80-\xBF]|$$|| -- excluding overlongs mypg(# $$[\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}|$$|| -- straight 3-byte mypg(# $$\xED[\x80-\x9F][\x80-\xBF]|$$|| -- excluding surrogates mypg(# $$\xF0[\x90-\xBF][\x80-\xBF]{2}|$$|| -- planes 1-3 mypg(# $$[\xF1-\xF3][\x80-\xBF]{3}|$$|| -- planes 4-15 mypg(# $$\xF4[\x80-\x8F][\x80-\xBF]{2}$$|| -- plane 16 mypg(# '*)$' ) mypg-# mypg-# ; ERROR: invalid regular expression: quantifier operand invalid -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general