Re: looking for a faster way to do that

Eduardo Morras <nec556@xxxxxxxxxx> · Mon, 26 Sep 2011 18:56:43 +0200

At 18:18 26/09/2011, you wrote:

Eduardo Morras <nec556@xxxxxxxxxx> wrote:

>>
>> At 08:04 25/09/2011, hamann.w@xxxxxxxxxxx wrote:
>>
>> > >> select * from items where regexp_matches(items.code,'(ABC) (DE1)
>> > >> (any_substring)')<>{};
>> > >>
>> >
>> >Hi Eduardo,
>> >
>> >it is clear that scanning the table once with a list of matches will
>> >outperform
>> >rescanning the table for every string wanted. Now, my problem is
>> >that the patterns are
>> >dynamic as well. So if I could translate a table with one
>> >column  and a few thousand rows
>> >into something like
>> >regexp_matches(code,'string1|string2|.....string2781')
>> >would ideally be a performant query. Unfortunately I have no idea
>> >how I could achieve this
>> >transformation inside the database. Doing it externally fails,
>> >because any single query cannot
>> >be more than so few characters.
>>
>> You can create a plsql function and pass a setof text that do it.
>> Sorry but instead saying you What Must You Type, WMYT(c), i prefer
>> the How Should You Do way, HSYD(c). Note that you can get the same
>> results using other approachs (f.ex. using FTS described in chapter 12)
>>
>> Check this topics:
>>
>> Function
>> 
Creation  http://www.postgresql.org/docs/9.0/static/sql-createfunction.html
>>
>> Tutorial about Function
>> Creation  http://www.adderpit.com/practical-postgresql/x10374.htm
>>

Hi,

I tried the pl/sql approach to convert the contents of that patterns 
table into a regex.
Results: 40 seconds runtime for 9500 candidates and 815 patterns
718 seconds for the same set of 9500 candidates, but using 4000 
patterns instead.
So it seems that I am reaching limits of pattern match

Perhaps calling the function twice with half the values go faster. 
How do you call the function? EXECUTE or SELECT? If you use EXECUTE 
then the prepared plan in a previous call is ignored and is usually 
faster. Don't know if in your case it run faster but you can try it.

As for the fulltext index (and the underlying tsquery): this is an 
exact match rather than prefix
match, so I would need to know match patterns in advance in order to 
build the index

I am thinking about that anyway (because ABC1234 likely should not 
match ABC123 pattern
in my context), but I would sort of prefer a system where I can 
state the rules when I
see the data set, rather than having to pre-create an index.

Thanks for the tutorial link :)

It's the one i used time ago. A bit old but very good one.

It seems that the responses on my post give all sorts of input that 
will help me on other
tasks

Regards
Wolfgang Hamann

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general