On 23 September 2011 14:29, <hamann.w@xxxxxxxxxxx> wrote: > > Alban Hertroys wrote: > > >> So you're comparing a variable field value to a variable pattern - yeah, > >> that's going to hurt. There's no way you could index exactly that. > >> > >> Perhaps there's some way you can transform the problem so that you get > >> something indexable? > >> For example, if your match patterns follow a certain pattern by themselves, > >> you could add a column with the longest match pattern that would match the > >> string. Then you could just do a query for which records have the match > >> pattern (in that new column) that you're looking for and voila! > >> > >> If something like that is possible strongly depends on what kind of match > >> patterns you're using, of course. > > Hi Alban, > > I already did that - the test set is just all records from the real table (about a million > entries) that match the common 'ABC' prefix I think you misunderstood what I wrote. Notice the difference between "which strings match the pattern" and "which records have the match pattern (in that new column)" - the first is a regular expression match (unindexable), while the second is a string equality match (indexable). What I'm suggesting is to add a column, which for the string 'ABCDEFG' would contain 'ABC%'. Data would look like: SELECT str, pattern FROM tbl; str | pattern ---------+--------- ABCDEFG | ABC% ABCDEF | ABC% BCDEFGH | BCD% etc. (can't format this properly in webmail, sorry) When you look for records that match the pattern 'ABC%', you would normally perform a query like: SELECT str FROM tbl WHERE str LIKE 'ABC%'; But with this new column, you would query: SELECT str FROM tbl WHERE pattern = 'ABC%'; As I said, it depends a lot on your pattern needs whether this solution would work at all for you. If you only ever use a few patterns, it will work. If you use many different patterns or don't know before-hand which patterns will be used, it won't work well at all. > The main difference is: the fast query looks like > > explain select items.num, wantcode from items, n where code = wantcode; > Merge Join (cost=53.56..1104.02 rows=39178 width=36) > Merge Cond: (("outer".code)::text = "inner".wantcode) > -> Index Scan using itemsc on items (cost=0.00..438.75 rows=9614 width=42) > -> Sort (cost=53.56..55.60 rows=815 width=32) > Sort Key: n.wantcode > -> Seq Scan on n (cost=0.00..14.15 rows=815 width=32) Is there an index on wantcode? If you have a million or more records, I would expect an index scan for a measly 815 matches... > and the slow ones looks like that one: > > Nested Loop (cost=14.15..176478.01 rows=39178 width=36) > Join Filter: (("outer".code)::text ~ "inner".wantcode) > > So the database takes an entirely differnet approach at retrieving the entries. Yes, because you're still using ~ there, with a pattern that's unknown at query planning time. That will only be fast under some fairly rare circumstances. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general