Hi Guys,
I'm using postgresql 8.3.1 and I'm seeing weird behavior between what
I expect and what's happening when the query is executed
I'm trying to match a table that contains regexps against another
table that is full of the text to match against so my query is:
select wc_rule.id from classifications, wc_rule where
classifications.classification ~* wc_rule.regexp;
When I run that the query takes a very very long time (never ending so
far 20 minutes or so) to execute.
But if I loop through all of the rules and a query for each rule:
select wc_rule.id from classifications, wc_rule where
classifications.classification ~* wc_rule.regexp and wc_rule.id = ?
All of the rules when run individually can be matched in a little
under then 3 minutes. I'd assume postgres would be equal to or faster
with the single row execution method.
The table schema:
CREATE TABLE wc_rule (
id integer NOT NULL,
regexp text,
);
CREATE TABLE classifications (
id integer NOT NULL,
classification text NOT NULL
);
gb_render_1_db=# explain select wc_rule.id from classifications,
wc_rule where classifications.classification ~* wc_rule.regexp;
QUERY PLAN
-----------------------------------------------------------------------------
Nested Loop (cost=13.71..891401.71 rows=197843 width=4)
Join Filter: (classifications.classification ~* wc_rule.regexp)
-> Seq Scan on classifications (cost=0.00..1093.46 rows=56446
width=42)
-> Materialize (cost=13.71..20.72 rows=701 width=22)
-> Seq Scan on wc_rule (cost=0.00..13.01 rows=701 width=22)
(5 rows)
gb_render_1_db=# select count(*) from classifications;
count
-------
56446
(1 row)
gb_render_1_db=# select count(*) from wc_rule;
count
-------
701
(1 row)
I have exports of the tables up at so you can try it if you'd like.
http://rusty.devel.infogears.com/regexp-tables.tar.bz2
Any insight is greatly appreciated, even if it's just showing me how I
made a mistake in the query.
Thanks,
Rusty
--
Rusty Conover
InfoGears Inc.
http://www.infogears.com
An example script that shows how each rule was run individually in perl.
$dbh->begin_work();
eval {
my $all_rules = $dbh->selectall_arrayref("select id from wc_rule");
foreach my $row (@$all_rules) {
print "Doing rule: $row->[0]\n";
eval {
local $SIG{ALRM} = sub { die("Alarm") };
alarm(5);
my $results = $dbh->selectall_arrayref("select wc_rule.id from
classifications, wc_rule where classifications.classification ~*
wc_rule.regexp and wc_rule.id = ?", undef, $row->[0]);
alarm(0);
};
if ($@) {
alarm(0);
print "Got bad rule id of : $row->[0]\n";
exit(0);
}
alarm(0);
print "ok rule: $row->[0]\n";
}
};
if ($@) {
print "Failed to run rules:\n$@\n";
$dbh->rollback();
$dbh->disconnect();
exit(-1);
}
$dbh->commit();
$dbh->disconnect();
exit(0);