I resolved the problem by eliminating the planner from the decision
making altogether, through using a (permanently) temporary table
populated by the subset of data records from an initial query generated
from the metadata, then searching only in the temp table for the actual
data records I want. I ran this code into a function sxa ("search by
author"). In my particular case I am looking for the same snippet of
text over and over again so I hardwired it into the function. The
function uses 'like' instead of '=' on the author so I am actually
comparing an exact match using the inner join to a pattern match in the
function.
c_db=> select count(id) from content;
count
---------
1248954
(1 row)
Time: 587.325 ms
c_db=> select count(id) from data inner join metadata on data.id =
metadata.id where author = 'Powers';
count
-------
347
(1 row)
Time: 519.435 ms
c_db=> select count(id) from data inner join metadata on data.id =
metadata.id where author = 'Powers' and content like '%some text%';
count
-------
14
(1 row)
Time: 209895.655 ms
c_db=> select count(id) from sxa('Powers');
count
-------
14
(1 row)
Time: 1794.600 ms
The above function run time includes deleting the previous search
results and creating 347 records in the temp table. I also find that it
appears to sometimes run even faster depending on the search (due to
variation in content length) and can be even less than half the above time:
c_db=> select count(id) from sxa('Zelazny');
count
-------
13
(1 row)
Time: 790.551 ms
The inner join time run time variance of searches is greater than the
total time for any search with the function. Total run time for any
inner join search was always 209 to 211 seconds, as all of the content
values appear to be searched. Use of a temp table reduced search time
of 1.25m records from 3 1/2 minutes to less than two seconds and in some
cases to less than one.
In summary, the answer to how to overcome a bad decision by the
postgresql planner appears to be move the subset of data into a
temporary table and force postgresql to look in it instead. The version
of postgresql I am using is 8.4.1, admittedly old. If there is a newer
version of postgresql that has fixed this, please point me to it and I
will see if I can upgrade to it, instead of crafting functions. Also,
if someone could refresh my memory on how relational databases are
supposed to work by default, I would appreciate it. I seem to have
gotten confused somewhere.
tim
c_db=> \d t
Table "public.t"
Column | Type | Modifiers
---------+-----------------------+-----------
id | character varying(30) | not null
content | text |
Indexes:
"t_pkey" PRIMARY KEY, btree (id)
CREATE OR REPLACE FUNCTION public.sxa(author character varying)
RETURNS TABLE(id character varying)
LANGUAGE sql
AS $function$
delete from t;
insert into t (select id, content from metadata inner join data on
metadata.id = data.id where author like $1);
select id from data where id in (select id from t where content like
'%some text%');
$function$
;