On 23 September 2011 09:45, <hamann.w@xxxxxxxxxxx> wrote:
So you're comparing a variable field value to a variable pattern - yeah, that's going to hurt. There's no way you could index exactly that.
Perhaps there's some way you can transform the problem so that you get something indexable?
For example, if your match patterns follow a certain pattern by themselves, you could add a column with the longest match pattern that would match the string. Then you could just do a query for which records have the match pattern (in that new column) that you're looking for and voila!
If something like that is possible strongly depends on what kind of match patterns you're using, of course.
That's an exact string match, of course that will be fast ;)
Alban Hertroys <haramrae@xxxxxxxxx> wrote:
>> What is the output of explain?
>>
>> You say 'the other table', so presumably we're dealing with a foreign key
>> here. Is there an index on that column?
Albe Laurenz wrote:Here I created a subset (just number and code matching a certain prefix)
>> Is the index used for "where code ~ '^ABC3563'"?
>>
>> If not, then the result is fast only because the table is scanned only once,
>> and it's just the factor of 3000 that's killing you.
>>
>> The second query (where code ~ wantcode) can never use an index because
>> the pattern "wantcode" is unknown at query planning time.
>>
>> Yours,
>> Laurenz Albe
\d items
Table "pg_temp_1.items"
Column | Type | Modifiers
--------+-----------------------+-----------
num | integer |
code | character varying(40) |
create index itemsc on items (code);
select count(*) from items;
count
-------
9614
A single anchored query
select * from items where code ~ '^ABC';
does indeed use the index to retrieve data.
Next I copied a file of wanted codes
create temp table n (wantcode text);
\copy n from /tmp/rmartin.tmp
the file contains plain names, i.e. unanchored matches
explain analyze select num, n.wantcode from items, n where items.code ~ n.wantcode;
Nested Loop (cost=20.00..216502.14 rows=48070 width=36) (actual time=148.479..336280.488 rows=2871 loops=1)
Join Filter: (("outer".code)::text ~ "inner".wantcode)
-> Seq Scan on items (cost=0.00..167.14 rows=9614 width=42) (actual time=0.048..38.666 rows=9614 loops=1)
-> Materialize (cost=20.00..30.00 rows=1000 width=32) (actual time=0.001..1.049 rows=815 loops=9614)
-> Seq Scan on n (cost=0.00..20.00 rows=1000 width=32) (actual time=0.003..1.839 rows=815 loops=1)
Total runtime: 336286.692 ms
So you're comparing a variable field value to a variable pattern - yeah, that's going to hurt. There's no way you could index exactly that.
Perhaps there's some way you can transform the problem so that you get something indexable?
For example, if your match patterns follow a certain pattern by themselves, you could add a column with the longest match pattern that would match the string. Then you could just do a query for which records have the match pattern (in that new column) that you're looking for and voila!
If something like that is possible strongly depends on what kind of match patterns you're using, of course.
An exact match "where items.code = n.wantcode" on the same data completes in 40 ms
That's an exact string match, of course that will be fast ;)
--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.