Search Postgresql Archives

unoptimized nested loops

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Everyone,
I am stumped as to what I am doing wrong.  I have two tables

metadata: parent table, 1.28m records
data: child table, 1.24m records

metadata contains descriptions of the records in data. data has two fields of concern, the id field, which is a foreign key to an identical field in metadata, and the content field, which contains text ranging from a few hundred to a few thousand characters. The id fields are alphanumeric for historical reasons. Table descriptions below, some fields omitted for brevity:


c_db=> \d metadata
  Table "public.metadata"
      Column      |           Type           | Modifiers
------------------+--------------------------+-----------
 id               | character varying(60)    | not null
 author           | character varying(90)    |

Indexes:
    "metadata_pkey" PRIMARY KEY, btree (id)
Referenced by:
TABLE "data" CONSTRAINT "fk_metadata" FOREIGN KEY (id) REFERENCES metadata(id)


c_db=> \d data
            Table "public.data"
 Column  |          Type           | Modifiers
---------+-------------------------+-----------
 id      | character varying(30)   | not null
 content | text                    |

Indexes:
    "data_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "fk_metadata" FOREIGN KEY (id) REFERENCES metadata(id)




I am regularly needing to search the content field in data for short text segments. I can reduce the number of rows needing to be searched by looking for particular types of entries in metadata. However, postgresql is apparently refusing to do so, and insists on searching the entire content column in the data table for the text segment of interest. It takes more time to search the effort to reduce than the entire data table straight up.

To be specific with two approaches:

c_db=> select count(id) from data;
  count
---------
 1248954
(1 row)

Time: 648.358 ms


ic_db=> select count(id) from data where content like '%some text%';
  count
---------
 1167810
(1 row)

Time: 180144.251 ms


c_db=>select count(id) from metadata where author like '%Kelly%';

 count
-------
  3558
(1 row)

Time: 1625.455 ms


c_db=>select count(id) from data where data.content like '%some text%' and data.id in (select id from metadata where metadata.author like '%Kelly%');

 count
-------
   152
(1 row)

Time: 211010.598 ms


c_db=> explain analyze select count(id) from data where data.content like '%some text%' and data.id in (select id from metadata where metadata.author like '%Kelly%');

QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=48203.30..48203.31 rows=1 width=124) (actual time=213021.968..213021.969 rows=1 loops=1) -> Nested Loop Semi Join (cost=0.00..48202.99 rows=125 width=124) (actual time=51392.697..213021.848 rows=152 loops=1) -> Seq Scan on data (cost=0.00..47132.93 rows=125 width=135) (actual time=0.176..183040.366 rows=1167810 loops=1)
               Filter: (data ~~ '%some text%'::text)
-> Index Scan using metadata_pkey on metadata (cost=0.00..8.55 rows=1 width=11) (actual time=0.025..0.025 rows=0 loops=1167810)
               Index Cond: ((metadata.id)::text = (data.id)::text)
               Filter: ((metadata.author)::text ~~ '%Kelly%'::text)
 Total runtime: 213022.028 ms
(8 rows)

Time: 213024.273 ms


Alternatively, using an inner join,


c_db=> select count(id) from data inner join metadata on data.id = metadata.id where data.content like '%some text%' and metadata.author like '%Kelly%';

 count
-------
   152
(1 row)

Time: 212211.047 ms

c_db=> explain analyze select count(id) from data inner join metadata on data.id = metadata.id where data.context like '%some text%' and metadata.author like '%Kelly%'; QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=48203.30..48203.31 rows=1 width=124) (actual time=212800.026..212800.026 rows=1 loops=1) -> Nested Loop (cost=0.00..48202.99 rows=125 width=124) (actual time=22463.361..212799.911 rows=152 loops=1) -> Seq Scan on data (cost=0.00..47132.93 rows=125 width=135) (actual time=0.542..182952.708 rows=1167810 loops=1)
               Filter: (data ~~ '%some text%'::text)
-> Index Scan using metadata_pkey on metadata (cost=0.00..8.55 rows=1 width=11) (actual time=0.025..0.025 rows=0 loops=1167810)
               Index Cond: ((metadata.id)::text = (data.id)::text)
               Filter: ((metadata.author)::text ~~ '%Kelly%'::text)
 Total runtime: 212800.076 ms
(8 rows)

Time: 212805.008 ms



I do not see evidence that the nested loop is trying to reduce overhead by using the smaller set. It seems to want to scan on data first either way.

I have run vacuum analyze, in the hopes that the optimizer is miscalculating, to no avail. I seem to be unable to force postgresql to use the smaller set to reduce the search, even with the use of "in" on a subquery.

Does anyone have any ideas about what I am doing wrong on such a fundamentally normal operation? I am sure I am missing something really obvious, but I can't even find discussions about forcing the use of smaller sets. I apologize for having to ask about something that is almost certainly trivial.

I am not subscribed to the list; please reply-all or offline.

Thank you,
tim





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux