Re: looking for a faster way to do that

Eduardo Morras <nec556@xxxxxxxxxx> · Sat, 24 Sep 2011 16:28:06 +0200

At 14:12 23/09/2011, hamann.w@xxxxxxxxxxx wrote:
Eduardo Morras wrote:

>> You can try these, i doubt they will use any index but its a
>> different approach:
>>
>> select * from items where 
length(items.code)<>length(rtrim(items.code,'ABC'));
>>
>> select * from items where strpos(items.code,'ABC')=0 or
>> strpos(items.code,'any_substring')=0;

Hi,

if I understand this right, it does not mean "check if the string 
appears at position 0"
which could translate into an index query, but rather "check if the 
string appears anywhere
and then check if that is position 0", so the entire table is checked.

The second one yes, as it checks all patterns you want only one time 
per row they only needs one table scan. The first one eliminates the 
substring 'ABC' from the string, if the lengths of both strings are 
equal, the substring 'ABC' wasn't in it. If they are different, the 
trimmed string will be shorter.

explain analyze select items.num, wantcode from items, n where 
strpos(code, wantcode) = 0;
 Nested Loop  (cost=167.14..196066.54 rows=39178 width=36) (actual 
time=0.074..36639.312 rows=7832539 loops=1)
   Join Filter: (strpos(("inner".code)::text, "outer".wantcode) = 0)
   ->  Seq Scan on n  (cost=0.00..14.15 rows=815 width=32) (actual 
time=0.005..2.212 rows=815 loops=1)
   ->  Materialize  (cost=167.14..263.28 rows=9614 width=42) 
(actual time=0.007..13.970 rows=9614 loops=815)
         ->  Seq Scan on items  (cost=0.00..167.14 rows=9614 
width=42) (actual time=0.044..14.855 rows=9614 loops=1)
 Total runtime: 46229.836 ms

The query ran much faster than the pattern query, however. This 
seems to be the performance
of just searching for a plain string vs. initializing the regex 
engine every time (for 815
queries in a test set)

It will do only one table scan while your original code will do one 
for each substring you want to test. You can add more and more 
substrings without too much cost. If you want to use the regex engine 
instead the postgresql string funtions check the regexp_matches(), it 
should be faster if you have 3000 substrings.

select * from items where regexp_matches(items.code,'(ABC) (DE1) 
(any_substring)')<>{};

Regards
Wolfgang Hamann

HTH 

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general