Search Postgresql Archives

BUG? Regular expression matching of optional character group at beginning of RE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I've noticed that regular expressions which are anchored at the
beginning of the text but have an optional part at the beginning
(e.g. '^(ge)?kommen$') are not evaluated correctly if there is an
index on the column.

Consider the following table:

#+BEGIN_SRC sql
    CREATE TABLE annotation ( 
       id SERIAL PRIMARY KEY, 
       name VARCHAR(20), 
       value VARCHAR(20) 
    );
#+END_SRC

And the following query:

#+BEGIN_SRC sql
SELECT count(*) FROM annotation WHERE name = 'lemma' AND value ~ '^(ge)?kommen$';
#+END_SRC

In my data set, this query should return 916 results; 911 rows match
'kommen' and 5 rows match 'gekommen'.

Here is the plan with an index on column name:

#+BEGIN_EXAMPLE
                                       QUERY PLAN                                       
----------------------------------------------------------------------------------------
 Aggregate  (cost=20101.31..20101.32 rows=1 width=0)
   ->  Bitmap Heap Scan on annotation  (cost=282.55..20101.31 rows=2 width=0)
         Recheck Cond: ((name)::text = 'lemma'::text)
         Filter: ((value)::text ~ '^(ge)?kommen$'::text)
         ->  Bitmap Index Scan on idx_test_name  (cost=0.00..282.55 rows=15196 width=0)
               Index Cond: ((name)::text = 'lemma'::text)
#+END_EXAMPLE

This plan correctly retrieves 916 rows. However, if I create an index
on the column value, the plan is as follows:

#+BEGIN_EXAMPLE
                                              QUERY PLAN                                              
------------------------------------------------------------------------------------------------------
 Aggregate  (cost=910.50..910.51 rows=1 width=0)
   ->  Bitmap Heap Scan on annotation  (cost=619.38..910.49 rows=2 width=0)
         Recheck Cond: ((name)::text = 'lemma'::text)
         Filter: ((value)::text ~ '^(ge)?kommen$'::text)
         ->  BitmapAnd  (cost=619.38..619.38 rows=76 width=0)
               ->  Bitmap Index Scan on idx_test_name  (cost=0.00..282.55 rows=15196 width=0)
                     Index Cond: ((name)::text = 'lemma'::text)
               ->  Bitmap Index Scan on idx_test_value  (cost=0.00..336.58 rows=15196 width=0)
                     Index Cond: (((value)::text ~>=~ 'ge'::text) AND ((value)::text ~<~ 'gf'::text))
#+END_EXAMPLE

This plan only retrieves the 5 rows matching 'gekommen'. Note the
usage of the index on the value column even though the first character
of the regular expression is not fixed.

The index creation command was:

#+BEGIN_SRC sql
CREATE INDEX idx_test_value ON annotation (value varchar_pattern_ops);
#+END_SRC

Note that I can force the correct evaluation of the regular expression
by prepending '(?e)'; however, this should not be necessary if I
understand the documentation correctly.

#+BEGIN_SRC sql
SELECT count(*) FROM annotation WHERE name = 'lemma' AND value ~ '(?e)^(ge)?kommen$';
#+END_SRC

I'm using PostgreSQL 9.1.4.

Cheers,
Viktor

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux