Re: Regexps - never completing join.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Returning to this problem this morning, I made some more insight.

The regexp cache isn't getting very many hits because the executor is looping through all of the classification rows then looping through all of the regular expressions, causing each expression to be recompiled every time since the cache limit is only for 32 cached regular expressions. You can think of the behavior like:

foreach classification {
  foreach regexp {
    do match
  }
}

Obviously to make this perform better without requiring a bigger regexp cache I'd like it to run like:

foreach regexp {
  foreach classification {
    do match
  }
}

That way the cache wouldn't have to be very big at all since the last used regular expression would be at the top of the cache.

Various methods of changing the query don't seem to have the desired effect. Even with setting join_collapse_limit to 1.

select wc_rule.id from wc_rule cross join classifications on classifications.classification ~* wc_rule.regexp;

                                 QUERY PLAN
-----------------------------------------------------------------------------
 Nested Loop  (cost=13.71..891401.71 rows=197843 width=4)
   Join Filter: (classifications.classification ~* wc_rule.regexp)
-> Seq Scan on classifications (cost=0.00..1093.46 rows=56446 width=42)
   ->  Materialize  (cost=13.71..20.72 rows=701 width=22)
         ->  Seq Scan on wc_rule  (cost=0.00..13.01 rows=701 width=22)
(5 rows)



select wc_rule.id from classifications cross join wc_rule on classifications.classification ~* wc_rule.regexp;

                                 QUERY PLAN
-----------------------------------------------------------------------------
 Nested Loop  (cost=13.71..891401.71 rows=197843 width=4)
   Join Filter: (classifications.classification ~* wc_rule.regexp)
-> Seq Scan on classifications (cost=0.00..1093.46 rows=56446 width=42)
   ->  Materialize  (cost=13.71..20.72 rows=701 width=22)
         ->  Seq Scan on wc_rule  (cost=0.00..13.01 rows=701 width=22)
(5 rows)



Both of those queries execute in the same looping order, there doesn't seem to be a control to say use this table as the inner table and this table as the outer table for the join that I could find.

One way I did find that worked to control the loop (but doesn't yield the same results because its a left join)

select wc_rule.id from wc_rule left join classifications on classifications.classification ~* wc_rule.regexp;

                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=1149.91..891457.45 rows=197843 width=4) (actual time=0.627..149051.505 rows=55126 loops=1)
   Join Filter: (classifications.classification ~* wc_rule.regexp)
-> Seq Scan on wc_rule (cost=0.00..13.01 rows=701 width=22) (actual time=0.030..1.272 rows=701 loops=1) -> Materialize (cost=1149.91..1714.37 rows=56446 width=42) (actual time=0.001..14.244 rows=56446 loops=701) -> Seq Scan on classifications (cost=0.00..1093.46 rows=56446 width=42) (actual time=0.022..29.913 rows=56446 loops=1)
 Total runtime: 149067.764 ms
(6 rows)

Thanks,

Rusty
--
Rusty Conover
InfoGears Inc.
http://www.infogears.com








[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux