Regexps - never completing join.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Guys,

I'm using postgresql 8.3.1 and I'm seeing weird behavior between what I expect and what's happening when the query is executed

I'm trying to match a table that contains regexps against another table that is full of the text to match against so my query is:

select wc_rule.id from classifications, wc_rule where classifications.classification ~* wc_rule.regexp;

When I run that the query takes a very very long time (never ending so far 20 minutes or so) to execute.

But if I loop through all of the rules and a query for each rule:

select wc_rule.id from classifications, wc_rule where classifications.classification ~* wc_rule.regexp and wc_rule.id = ?

All of the rules when run individually can be matched in a little under then 3 minutes. I'd assume postgres would be equal to or faster with the single row execution method.

The table schema:

CREATE TABLE wc_rule (
    id integer NOT NULL,
    regexp text,
);

CREATE TABLE classifications (
    id integer NOT NULL,
    classification text NOT NULL
);

gb_render_1_db=# explain select wc_rule.id from classifications, wc_rule where classifications.classification ~* wc_rule.regexp;
                                 QUERY PLAN
-----------------------------------------------------------------------------
 Nested Loop  (cost=13.71..891401.71 rows=197843 width=4)
   Join Filter: (classifications.classification ~* wc_rule.regexp)
-> Seq Scan on classifications (cost=0.00..1093.46 rows=56446 width=42)
   ->  Materialize  (cost=13.71..20.72 rows=701 width=22)
         ->  Seq Scan on wc_rule  (cost=0.00..13.01 rows=701 width=22)
(5 rows)


gb_render_1_db=# select count(*) from classifications;
 count
-------
 56446
(1 row)

gb_render_1_db=# select count(*) from wc_rule;
 count
-------
   701
(1 row)

I have exports of the tables up at so you can try it if you'd like.

http://rusty.devel.infogears.com/regexp-tables.tar.bz2

Any insight is greatly appreciated, even if it's just showing me how I made a mistake in the query.

Thanks,

Rusty
--
Rusty Conover
InfoGears Inc.
http://www.infogears.com


An example script that shows how each rule was run individually in perl.

$dbh->begin_work();
eval {
  my $all_rules = $dbh->selectall_arrayref("select id from wc_rule");
  foreach my $row (@$all_rules) {
    print "Doing rule: $row->[0]\n";
    eval {
      local $SIG{ALRM} = sub { die("Alarm") };
      alarm(5);
my $results = $dbh->selectall_arrayref("select wc_rule.id from classifications, wc_rule where classifications.classification ~* wc_rule.regexp and wc_rule.id = ?", undef, $row->[0]);
      alarm(0);
    };
    if ($@) {
      alarm(0);
      print "Got bad rule id of : $row->[0]\n";
      exit(0);
    }
    alarm(0);
    print "ok rule: $row->[0]\n";
  }
};
if ($@) {
  print "Failed to run rules:\n$@\n";
  $dbh->rollback();
  $dbh->disconnect();
  exit(-1);
}

$dbh->commit();
$dbh->disconnect();
exit(0);









[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux