Search Postgresql Archives

Re: looking for a faster way to do that

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 25 Sep 2011, at 8:04, hamann.w@xxxxxxxxxxx wrote:
> Hi Eduardo,
> 
> it is clear that scanning the table once with a list of matches will outperform
> rescanning the table for every string wanted. Now, my problem is that the patterns are
> dynamic as well. So if I could translate a table with one column  and a few thousand rows
> into something like
> regexp_matches(code,'string1|string2|.....string2781')
> would ideally be a performant query. Unfortunately I have no idea how I could achieve this
> transformation inside the database. Doing it externally fails, because any single query cannot
> be more than so few characters.



To me it sounds a little bit like you're comparing every item in a warehouse to a set of descriptions to see what type of item it is, which is something you would be much better off storing as a property of the item. If an item is a fruit, store that it's a fruit!
But I'm guessing at what you're trying to accomplish, so here's a few other options...

I guess you could create 2781 expression indexes to do what you want (is there a limit that prevents this?). Query planning would probably become kind of slow and the indices will take up a considerable fraction of the total table storage required - that's a pretty outlandish approach.

CREATE INDEX tbl_str_regex1 ON tbl (regexp_matches(str, 'string1'));
CREATE INDEX tbl_str_regex1 ON tbl (regexp_matches(str, 'string2'));
...
CREATE INDEX tbl_str_regex1 ON tbl (regexp_matches(str, 'string2781'));

Or are you really going to query every record against all 2781 regexes? I can't figure out a realistic scenario why you (or anyone) would want that.
In that case those indices aren't going to help you much, as the planner would have to hold every record in tbl to each index - it won't do that.


You could also create a giant lookup table (a materialized view, if you like) where the results of every match of str in tbl against the wantcode in the regex table is stored. That's some huge overhead, but it will probably outperform most other options. With the numbers you gave that table will hold about 2-3 billion records with two foreign key values and a truth value each.


Alban Hertroys

--
The scale of a problem often equals the size of an ego.



-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux