Hi Andrew,
Thanks for your great reply.
Andrew Gierth wrote:
"Phil" == Phil Endecott <spam_from_pgsql_lists@xxxxxxxxxxxx> writes:
Phil> As a hack I tried ALTER FUNCTION to make it immutable,
A better approach is to wrap it in a function of your own which is
declared immutable, rather than hacking the catalogs:
create function from_utf8(bytea) returns text language plpgsql immutable
as $$ begin return convert_from($1, 'UTF8'); end; $$;
Thanks. I'm a bit surprised that it will allow me to declare a
function immutable if it calls functions that aren't themselves
immutable, but if it works... great.
Phil> Anyway: given the problem of creating a text search index over
Phil> bytea data that contains UTF-8 text, which may include oddities
Phil> like null bytes, what would you do?
You can search for 0x00 in a bytea using position() or LIKE. What do you
want to do with values that contain null bytes? or values which you
think are supposed to be valid utf8 text but are not?
As long as it doesn't crash I don't really care; it would be better
if the text search worked for the valid parts of the text but even
that isn't essential.
I think I will probably need to do some more preprocessing on the
data when I load it, at least to remove the null bytes.
Regards, Phil.