Pavel Stehlule wrote: >> >> Hi, >> >> >> >> I am trying to match items from 2 tables based on a common string. >> >> One is a big table which has one column with entries like XY123, ABC44, = >> etc >> >> The table has an index on that column. >> >> The second table is, typically, much smaller >> >> >> >> select .... from tab1, tab2 where tab1.code =3D tab2.code; >> >> >> >> This works fine and fast. >> >> Now, as a variant, I have some entries like XY423A, XY423B, GF55A, GF55D= >> in the >> >> big table and want them to match XY423, GF55 in the second table >> >> >> >> Variants I have tried >> >> >> >> select .... from tab1, tab2 where tab1.code ~ (tab2.code||'($|[A-Z])'); >> >> select .... from tab1, tab2 where tab1.code ~ ('^'||tab2.code||'($|[A-Z= >> ])'); >> >> >> > >> > Have you tried the substring function? >> > >> > select .... from tab1, tab2 where substring(tab1.code from 1 for 5) =3D >> > tab2.code >> > Hi Pavel, it was just by chance that a fixed size substring would match the data at hand. It is more common to have a digit/letter (or vice versa) boundary or a hyphen there >> > >> >> both take an enormous time. In the better case that I can subset (e.g. a= >> ll candidates in table 2 >> >> share initial "AX") I get back to manageable times by adding >> >> and tab1.code ~ '^AX' >> >> into the recipe. Actual runtime with about a million entries in tab1 and= >> 800 entries in tab2 >> >> is about 40 seconds. >> >> any join where result is related to some function result can be very >> slow, because estimation will be out and any repeated function >> evaluation is just expensive. >> I see the problem since obviously every the ~ operator with a non-constant pattern is constantly recompiling the pattern. I wonder whether it would be possible to invent a prefix-match operator that approaches the performance of string equality. I noted in the past (not sure whether anything has changed in regex matching) that a constant leading part of regex would improve performance, i.e. use an index scan to select possible candidates. >> You can try use a functional index. >> >> create index on tab2 ((substring(tab1.code from 1 for 5)) >> What kind of trick is that - mixing two tables into a functional index? What would the exact syntax be for that? Regards Wolfgang Hamann -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general