Re: Regexps - never completing join.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 






On May 13, 2008, at 11:45 PM, Rusty Conover wrote:

Hi Guys,

I'm using postgresql 8.3.1 and I'm seeing weird behavior between what I expect and what's happening when the query is executed

I'm trying to match a table that contains regexps against another table that is full of the text to match against so my query is:

select wc_rule.id from classifications, wc_rule where classifications.classification ~* wc_rule.regexp;

When I run that the query takes a very very long time (never ending so far 20 minutes or so) to execute.

But if I loop through all of the rules and a query for each rule:

select wc_rule.id from classifications, wc_rule where classifications.classification ~* wc_rule.regexp and wc_rule.id = ?

All of the rules when run individually can be matched in a little under then 3 minutes. I'd assume postgres would be equal to or faster with the single row execution method.

The table schema:

CREATE TABLE wc_rule (
   id integer NOT NULL,
   regexp text,
);

CREATE TABLE classifications (
   id integer NOT NULL,
   classification text NOT NULL
);

gb_render_1_db=# explain select wc_rule.id from classifications, wc_rule where classifications.classification ~* wc_rule.regexp;
                                QUERY PLAN
-----------------------------------------------------------------------------
Nested Loop  (cost=13.71..891401.71 rows=197843 width=4)
  Join Filter: (classifications.classification ~* wc_rule.regexp)
-> Seq Scan on classifications (cost=0.00..1093.46 rows=56446 width=42)
  ->  Materialize  (cost=13.71..20.72 rows=701 width=22)
        ->  Seq Scan on wc_rule  (cost=0.00..13.01 rows=701 width=22)
(5 rows)




As a followup I did some digging:

by editing:

src/backend/utils/adt/regexp.c

and increasing the cache size for regular expressions to an arbitrarily large number

#define MAX_CACHED_RES  3200

Rather then the default of

#define MAX_CACHED_RES  32

I was able to get the query to complete in a respectable amount of time:

gb_render_1_db=# explain analyze select wc_rule.id from classifications, wc_rule where classifications.classification ~* wc_rule.regexp;
                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=13.71..891401.71 rows=197843 width=4) (actual time=72.714..366899.913 rows=55052 loops=1)
   Join Filter: (classifications.classification ~* wc_rule.regexp)
-> Seq Scan on classifications (cost=0.00..1093.46 rows=56446 width=42) (actual time=28.820..109.895 rows=56446 loops=1) -> Materialize (cost=13.71..20.72 rows=701 width=22) (actual time=0.000..0.193 rows=701 loops=56446) -> Seq Scan on wc_rule (cost=0.00..13.01 rows=701 width=22) (actual time=0.030..0.593 rows=701 loops=1)
 Total runtime: 366916.632 ms
(6 rows)

Which is still > 6 minutes, but at least it completed.

I'll keep digging into what is causing this bad performance.

Thanks,

Rusty
--
Rusty Conover
InfoGears Inc.
http://www.infogears.com



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux