Re: unoptimized nested loops

Tim Kelly <gtkelly@xxxxxxxxxxxxxxxxx> · Sat, 04 Jun 2022 07:32:18 -0400

I resolved the problem by eliminating the planner from the decision 
making altogether, through using a (permanently) temporary table 
populated by the subset of data records from an initial query generated 
from the metadata, then searching only in the temp table for the actual 
data records I want.  I ran this code into a function sxa ("search by 
author").  In my particular case I am looking for the same snippet of 
text over and over again so I hardwired it into the function.  The 
function uses 'like' instead of '=' on the author so I am actually 
comparing an exact match using the inner join to a pattern match in the 
function.

c_db=> select count(id) from content;
  count
---------
 1248954
(1 row)

Time: 587.325 ms

c_db=> select count(id) from data inner join metadata on data.id = 
metadata.id where author = 'Powers';
 count
-------
   347
(1 row)

Time: 519.435 ms

c_db=> select count(id) from data inner join metadata on data.id = 
metadata.id where author = 'Powers' and content like '%some text%';
 count
-------
    14
(1 row)

Time: 209895.655 ms

c_db=> select count(id) from sxa('Powers');
 count
-------
    14
(1 row)

Time: 1794.600 ms

The above function run time includes deleting the previous search 
results and creating 347 records in the temp table.  I also find that it 
appears to sometimes run even faster depending on the search (due to 
variation in content length) and can be even less than half the above time:

c_db=> select count(id) from sxa('Zelazny');
 count
-------
    13
(1 row)

Time: 790.551 ms

The inner join time run time variance of searches is greater than the 
total time for any search with the function.  Total run time for any 
inner join search was always 209 to 211 seconds, as all of the content 
values appear to be searched.  Use of a temp table reduced search time 
of 1.25m records from 3 1/2 minutes to less than two seconds and in some 
cases to less than one.

In summary, the answer to how to overcome a bad decision by the 
postgresql planner appears to be move the subset of data into a 
temporary table and force postgresql to look in it instead.  The version 
of postgresql I am using is 8.4.1, admittedly old.  If there is a newer 
version of postgresql that has fixed this, please point me to it and I 
will see if I can upgrade to it, instead of crafting functions.  Also, 
if someone could refresh my memory on how relational databases are 
supposed to work by default, I would appreciate it.  I seem to have 
gotten confused somewhere.

tim

c_db=> \d t
              Table "public.t"
 Column  |         Type          | Modifiers
---------+-----------------------+-----------
 id      | character varying(30) | not null
 content | text                  |
Indexes:
    "t_pkey" PRIMARY KEY, btree (id)

CREATE OR REPLACE FUNCTION public.sxa(author character varying)
 RETURNS TABLE(id character varying)
 LANGUAGE sql
AS $function$
delete from t;
insert into t (select id, content from metadata inner join data on 
metadata.id = data.id where author like $1);
select id from data where id in (select id from t where content like 
'%some text%');
$function$
;