On Fri, Jul 07, 2006 at 04:29:51AM -0700, badlydrawnbhoy <badlydrawnbhoy@xxxxxxxxx> wrote a message of 48 lines which said: > I've got a database of URLs, and when inserting new data into it I > want to make sure that there are no functionally equivalent URLs > already present. For example, 'umist.ac.uk' is functionally the same > as 'umist.ac.uk/'. IMHO, your problem seems to be an instance of a very general class: data that needs canonicalization. For instance, I work for a domain name registry and domain names are case-insensitive. Do we store the domain names blindly and then always use ILIKE or other case-insensitive operators? No, we canonicalize domain names by changing them to lower-case. That way: * we do not have to think of using case-insensitive operators, * indexes do work. This is what I recommend here: decide on a canonical form and canonicalize everything when it comes into the database (it is easy to do it from a trigger). If the original form is important, you can still store it in a column intended for display, not for searching. Here is a way to canonicalize, with a trigger. The function "canonicalize" is left as an exercice (you can write it in PL/pgSQL, C, Python, etc): CREATE OR REPLACE FUNCTION force_canonicalization() RETURNS TRIGGER AS 'BEGIN NEW.url = canonicalize(NEW.url); RETURN NEW; END;' LANGUAGE PLPGSQL; CREATE TRIGGER force_canonicalization BEFORE INSERT ON Things FOR EACH ROW EXECUTE PROCEDURE force_canonicalization();