At 18:18 26/09/2011, you wrote:
Eduardo Morras <nec556@xxxxxxxxxx> wrote:
>>
>> At 08:04 25/09/2011, hamann.w@xxxxxxxxxxx wrote:
>>
>> > >> select * from items where regexp_matches(items.code,'(ABC) (DE1)
>> > >> (any_substring)')<>{};
>> > >>
>> >
>> >Hi Eduardo,
>> >
>> >it is clear that scanning the table once with a list of matches will
>> >outperform
>> >rescanning the table for every string wanted. Now, my problem is
>> >that the patterns are
>> >dynamic as well. So if I could translate a table with one
>> >column and a few thousand rows
>> >into something like
>> >regexp_matches(code,'string1|string2|.....string2781')
>> >would ideally be a performant query. Unfortunately I have no idea
>> >how I could achieve this
>> >transformation inside the database. Doing it externally fails,
>> >because any single query cannot
>> >be more than so few characters.
>>
>> You can create a plsql function and pass a setof text that do it.
>> Sorry but instead saying you What Must You Type, WMYT(c), i prefer
>> the How Should You Do way, HSYD(c). Note that you can get the same
>> results using other approachs (f.ex. using FTS described in chapter 12)
>>
>> Check this topics:
>>
>> Function
>>
Creation http://www.postgresql.org/docs/9.0/static/sql-createfunction.html
>>
>> Tutorial about Function
>> Creation http://www.adderpit.com/practical-postgresql/x10374.htm
>>
Hi,
I tried the pl/sql approach to convert the contents of that patterns
table into a regex.
Results: 40 seconds runtime for 9500 candidates and 815 patterns
718 seconds for the same set of 9500 candidates, but using 4000
patterns instead.
So it seems that I am reaching limits of pattern match
Perhaps calling the function twice with half the values go faster.
How do you call the function? EXECUTE or SELECT? If you use EXECUTE
then the prepared plan in a previous call is ignored and is usually
faster. Don't know if in your case it run faster but you can try it.
As for the fulltext index (and the underlying tsquery): this is an
exact match rather than prefix
match, so I would need to know match patterns in advance in order to
build the index
I am thinking about that anyway (because ABC1234 likely should not
match ABC123 pattern
in my context), but I would sort of prefer a system where I can
state the rules when I
see the data set, rather than having to pre-create an index.
Thanks for the tutorial link :)
It's the one i used time ago. A bit old but very good one.
It seems that the responses on my post give all sorts of input that
will help me on other
tasks
Regards
Wolfgang Hamann
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general