Very Fast Version: Recently my database stopped respecting one of my indexes, which took a query that should run in "subsecond response time" and turning it into something that with small data sets runs in the 7-10 minute range and with large data sets runs in the 30 minute - eternity range. Explain Analyze tells me that what used to be an Index Scan has become a Seq Scan, doing a full table scan through 140 million records. Any thoughts on why that happens? +++++++++++++++++++ Longer version: I have a very large database (billion+ records) that has begun not respecting indexes on some queries, resulting in what used to be "instant answers" now taking many minutes and I'm trying to figure out why. I'll try to simplify the request this way: I have a database of many hundreds of millions of email messages, and information about those messages. There is a table that tells me when I received emails and from what source. There is another table that tells what URLs were seen in the bodies of those emails and there is a table that links those two tables together. Since many emails can contain the same URL, and many URLs can be seen in each email, we divide like this: Tables =========== emails link_urls email_links email.message_id = link_url.message_id linkurl.urlid = email_links.urlid One attribute of the URL is the "hostname" portion. In my puzzle, I have a table of hostnames, and I want to get statistics about which emails contained URLs that pointed to those hostnames. A very simplified version of the query could be: select email.stuff from email natural join link_url natural join email_links where hostname = 'foo.bar.com'; We know that there were two emails that advertised foo.bar.com, so the objective is to link the presence of foo.bar.com to a URLID, use the link_url table to find the respective message_id and then go ask the email table for details about those two messages. My "explain analyze" of that query takes seven minutes to complete, despite the fact that I have an index on message_id, and urlid, and machine where they occur in each of the three tables. We broke the query down into it's components, and confirmed that what we consider the three natural pieces of the query each run "blazing fast", each using the proper indexes. But when we put them all together, the index gets ignored, and we sequential scan a 140 million row table. Which is slow. =============================== If we break it into pieces ... a human might think of the pieces as: ============ explain analyze select urlid from email_links where machine = 'foo.bar.com'; QUERY PLAN ------------------------------------------------------------------------------------------- Bitmap Heap Scan on email_links (cost=97.10..9650.58 rows=2457 width=33) (actual time=0.049..0.049 rows=1 loops=1) Recheck Cond: (machine = 'foo.bar.com'::text) -> Bitmap Index Scan on hostdex (cost=0.00..96.49 rows=2457 width=0) (actual time=0.039..0.039 rows=1 loops=1) Index Cond: (machine = 'foo.bar.com'::text) Total runtime: 0.066 ms (5 rows) ============= => explain analyze select message_id from link_url where urlid = '9de6440fcc089c654806bd4c853c76f1'; QUERY PLAN ------------------------------------------------------------------------------------------- Bitmap Heap Scan on link_url (cost=65.20..5680.14 rows=1437 width=4) (actual time=0.074..0.083 rows=2 loops=1) Recheck Cond: (urlid = '9de6440fcc089c654806bd4c853c76f1'::text) -> Bitmap Index Scan on link_url_hkey (cost=0.00..64.84 rows=1437 width=0) (actual time=0.021..0.021 rows=2 loops=1) Index Cond: (urlid = '9de6440fcc089c654806bd4c853c76f1'::text) Total runtime: 0.109 ms (5 rows) ================= explain analyze select stuff from email where message_id in (78085350, 78088168); QUERY PLAN ------------------------------------------------------------------------------------------- Bitmap Heap Scan on email (cost=17.19..25.21 rows=2 width=8) (actual time=0.068..0.077 rows=2 loops=1) Recheck Cond: (message_id = ANY ('{78085350,78088168}'::integer[])) -> Bitmap Index Scan on email_pkey (cost=0.00..17.19 rows=2 width=0) (actual time=0.054..0.054 rows=2 loops=1) Index Cond: (message_id = ANY ('{78085350,78088168}'::integer[])) Total runtime: 0.100 ms (5 rows) ++++++++++++++++++++++ In my simple mind that should take (.1 + .109 + .066) = FAST. Here's the "Explain Analyze" for the combined version: ++++++++++++++++++++++++ explain analyze select email.stuff from email natural join link_url natural join email_link where machine = 'foo.bar.com'; QUERY PLAN ------------------------------------------------------------------------------------------- Merge Join (cost=3949462.38..8811048.82 rows=4122698 width=7) (actual time=771578.076..777749.755 rows=3 loops=1) Merge Cond: (email.message_id = link_url.message_id) -> Index Scan using email_pkey on email (cost=0.00..4561330.19 rows=79154951 width=11) (actual time=0.041..540883.445 rows=79078427 loops=1) -> Materialize (cost=3948986.49..4000520.21 rows=4122698 width=4) (actual time=227023.820..227023.823 rows=3 loops=1) -> Sort (cost=3948986.49..3959293.23 rows=4122698 width=4) (actual time=227023.816..227023.819 rows=3 loops=1) Sort Key: link_url.message_id Sort Method: quicksort Memory: 25kB -> Hash Join (cost=9681.33..3326899.30 rows=4122698 width=4) (actual time=216443.617..227023.798 rows=3 loops=1) Hash Cond: (link_url.urlid = email_link.urlid) -> Seq Scan on link_url (cost=0.00..2574335.33 rows=140331133 width=37) (actual time=0.013..207980.261 rows=140330592 lo ops=1) -> Hash (cost=9650.62..9650.62 rows=2457 width=33) (actual time=0.074..0.074 rows=1 loops=1) -> Bitmap Heap Scan on email_link (cost=97.10..9650.62 rows=2457 width=33) (actual time=0.072..0.072 rows=1 loops=1 ) Recheck Cond: (hostname = 'foo.bar.com'::text) -> Bitmap Index Scan on hostdex (cost=0.00..96.49 rows=2457 width=0) (actual time=0.060..0.060 rows=1 loops=1 ) Index Cond: (hostname = 'foo.bar.com'::text) Total runtime: 777749.820 ms (16 rows) ++++++++++++++++++ See that "Seq Scan on link_url"? We can't figure out why that is there! We should be scanning for a matching "urlid" and we have an index on "urlid"? When this is happening in a "two table" version of this problem, we can get temporary relief by giving the statement: set enable_seqscan = false; But in the "three table" version (which is itself a simplification of the real problem, which is that instead of looking for 'foo.bar.com', I'm looking for all the hostnames in the table "testurls") the "enable_seqscan = false" doesn't seem to do anything. ++++++++++++++++++ Does anyone have suggestions as to why my database has started ignoring this index in certain circumstances? We've tried a variety of different joins and subselects and can't seem to beat the "combined query" index failure. The work-around has been to actually write code that makes the query for urlids, and then manually does the query into the table for message_ids, and then manually does the query for email.stuff. It's bad fast, but we can't figure out why 'the old way' quit working. Suggestions welcome! -- ---------------------------------------------------------- Gary Warner ----------------------------------------------------------- -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance