Search Postgresql Archives

Re: unoptimized nested loops

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I resolved the problem by eliminating the planner from the decision making altogether, through using a (permanently) temporary table populated by the subset of data records from an initial query generated from the metadata, then searching only in the temp table for the actual data records I want. I ran this code into a function sxa ("search by author"). In my particular case I am looking for the same snippet of text over and over again so I hardwired it into the function. The function uses 'like' instead of '=' on the author so I am actually comparing an exact match using the inner join to a pattern match in the function.

c_db=> select count(id) from content;
  count
---------
 1248954
(1 row)

Time: 587.325 ms

c_db=> select count(id) from data inner join metadata on data.id = metadata.id where author = 'Powers';
 count
-------
   347
(1 row)

Time: 519.435 ms

c_db=> select count(id) from data inner join metadata on data.id = metadata.id where author = 'Powers' and content like '%some text%';
 count
-------
    14
(1 row)

Time: 209895.655 ms

c_db=> select count(id) from sxa('Powers');
 count
-------
    14
(1 row)

Time: 1794.600 ms

The above function run time includes deleting the previous search results and creating 347 records in the temp table. I also find that it appears to sometimes run even faster depending on the search (due to variation in content length) and can be even less than half the above time:

c_db=> select count(id) from sxa('Zelazny');
 count
-------
    13
(1 row)

Time: 790.551 ms

The inner join time run time variance of searches is greater than the total time for any search with the function. Total run time for any inner join search was always 209 to 211 seconds, as all of the content values appear to be searched. Use of a temp table reduced search time of 1.25m records from 3 1/2 minutes to less than two seconds and in some cases to less than one.

In summary, the answer to how to overcome a bad decision by the postgresql planner appears to be move the subset of data into a temporary table and force postgresql to look in it instead. The version of postgresql I am using is 8.4.1, admittedly old. If there is a newer version of postgresql that has fixed this, please point me to it and I will see if I can upgrade to it, instead of crafting functions. Also, if someone could refresh my memory on how relational databases are supposed to work by default, I would appreciate it. I seem to have gotten confused somewhere.

tim



c_db=> \d t
              Table "public.t"
 Column  |         Type          | Modifiers
---------+-----------------------+-----------
 id      | character varying(30) | not null
 content | text                  |
Indexes:
    "t_pkey" PRIMARY KEY, btree (id)

CREATE OR REPLACE FUNCTION public.sxa(author character varying)
 RETURNS TABLE(id character varying)
 LANGUAGE sql
AS $function$
delete from t;
insert into t (select id, content from metadata inner join data on metadata.id = data.id where author like $1); select id from data where id in (select id from t where content like '%some text%');
$function$
;





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux