So how about the removal of the "AND" clause? On a test server, this drops the query from 201204 to 438 ms. Is this just random, or is it a real solution that might apply to any arbitrary combination of words? Attached are three test runs: Total runtime: 201204.972 ms Total runtime: 437.766 ms Total runtime: 341.727 ms
preproduction-20091214=# ALTER TABLE article_words ALTER COLUMN word_key SET STATISTICS 50; ALTER TABLE preproduction-20091214=# ANALYZE VERBOSE article_words; INFO: analyzing "public.article_words" INFO: "article_words": scanned 75000 of 1342374 pages, containing 16124750 live rows and 825250 dead rows; 75000 rows in sample, 288605935 estimated total rows ANALYZE preproduction-20091214=# EXPLAIN ANALYZE SELECT contexts.context_key FROM contexts JOIN articles ON (articles.context_key=contexts.context_key) JOIN matview_82034 ON (contexts.context_key=matview_82034.context_key) WHERE contexts.context_key IN (SELECT context_key FROM article_words JOIN words using (word_key) WHERE word = 'insider' INTERSECT SELECT context_key FROM article_words JOIN words using (word_key) WHERE word = 'trading') AND contexts.context_key IN (SELECT a.context_key FROM virtual_ancestors a JOIN bp_categories ON (a.ancestor_key = bp_categories.context_key) WHERE lower(bp_categories.category) = 'law') AND articles.indexed; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=12665.41..61953.71 rows=2 width=4) (actual time=140.799..201204.025 rows=546 loops=1) -> Nested Loop IN Join (cost=12665.41..61951.82 rows=1 width=16) (actual time=140.786..201196.526 rows=546 loops=1) Join Filter: (a.context_key = articles.context_key) -> Nested Loop (cost=12665.41..12678.98 rows=2 width=12) (actual time=55.453..116.790 rows=1306 loops=1) -> Nested Loop (cost=12665.41..12675.18 rows=2 width=8) (actual time=55.439..98.132 rows=1306 loops=1) -> Subquery Scan "IN_subquery" (cost=12665.41..12665.68 rows=5 width=4) (actual time=55.415..76.025 rows=1473 loops=1) -> SetOp Intersect (cost=12665.41..12665.64 rows=5 width=4) (actual time=55.413..73.885 rows=1473 loops=1) -> Sort (cost=12665.41..12665.52 rows=46 width=4) (actual time=55.406..62.222 rows=17892 loops=1) Sort Key: "*SELECT* 1".context_key Sort Method: quicksort Memory: 1607kB -> Append (cost=0.00..12664.14 rows=46 width=4) (actual time=0.060..42.065 rows=17892 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..6332.07 rows=23 width=4) (actual time=0.059..6.962 rows=3583 loops=1) -> Nested Loop (cost=0.00..6331.84 rows=23 width=4) (actual time=0.058..5.148 rows=3583 loops=1) -> Index Scan using words_word on words (cost=0.00..2.22 rows=1 width=4) (actual time=0.018..0.019 rows=1 loops=1) Index Cond: ((word)::text = 'insider'::text) -> Index Scan using article_words_wc on article_words (cost=0.00..6227.18 rows=8195 width=8) (actual time=0.036..3.275 rows=3583 loops=1) Index Cond: (public.article_words.word_key = public.words.word_key) -> Subquery Scan "*SELECT* 2" (cost=0.00..6332.07 rows=23 width=4) (actual time=0.037..27.136 rows=14309 loops=1) -> Nested Loop (cost=0.00..6331.84 rows=23 width=4) (actual time=0.035..19.912 rows=14309 loops=1) -> Index Scan using words_word on words (cost=0.00..2.22 rows=1 width=4) (actual time=0.015..0.017 rows=1 loops=1) Index Cond: ((word)::text = 'trading'::text) -> Index Scan using article_words_wc on article_words (cost=0.00..6227.18 rows=8195 width=8) (actual time=0.018..12.464 rows=14309 loops=1) Index Cond: (public.article_words.word_key = public.words.word_key) -> Index Scan using article_key_idx on articles (cost=0.00..1.89 rows=1 width=4) (actual time=0.011..0.012 rows=1 loops=1473) Index Cond: (articles.context_key = "IN_subquery".context_key) Filter: articles.indexed -> Index Scan using contexts_pkey on contexts (cost=0.00..1.89 rows=1 width=4) (actual time=0.010..0.011 rows=1 loops=1306) Index Cond: (contexts.context_key = articles.context_key) -> Nested Loop (cost=0.00..59848.83 rows=1124834 width=4) (actual time=0.012..142.771 rows=39193 loops=1306) -> Seq Scan on bp_categories (cost=0.00..1231.49 rows=16669 width=4) (actual time=0.006..19.719 rows=14552 loops=1306) Filter: (lower(category) = 'law'::text) -> Index Scan using virtual_ancestor_key_idx on virtual_ancestors a (cost=0.00..2.57 rows=76 width=8) (actual time=0.004..0.006 rows=3 loops=19004304) Index Cond: (a.ancestor_key = bp_categories.context_key) -> Index Scan using matview_82034_ck on matview_82034 (cost=0.00..1.88 rows=1 width=4) (actual time=0.010..0.011 rows=1 loops=546) Index Cond: (matview_82034.context_key = articles.context_key) Total runtime: 201204.972 ms (36 rows) preproduction-20091214=# EXPLAIN ANALYZE SELECT contexts.context_key FROM contexts JOIN articles ON (articles.context_key=contexts.context_key) JOIN matview_82034 ON (contexts.context_key=matview_82034.context_key) WHERE contexts.context_key IN (SELECT context_key FROM article_words JOIN words using (word_key) WHERE word = 'insider' INTERSECT SELECT context_key FROM article_words JOIN words using (word_key) WHERE word = 'trading') AND contexts.context_key IN (SELECT a.context_key FROM virtual_ancestors a JOIN bp_categories ON (a.ancestor_key = bp_categories.context_key) WHERE lower(bp_categories.category) = 'law'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=90968.66..95382.31 rows=5 width=4) (actual time=284.789..437.241 rows=648 loops=1) -> Nested Loop (cost=90968.66..95378.53 rows=2 width=16) (actual time=284.778..338.030 rows=649 loops=1) -> Hash IN Join (cost=90968.66..95374.73 rows=2 width=12) (actual time=284.764..333.145 rows=649 loops=1) Hash Cond: (contexts.context_key = a.context_key) -> Nested Loop (cost=12665.41..12675.20 rows=5 width=8) (actual time=55.557..81.206 rows=1473 loops=1) -> Subquery Scan "IN_subquery" (cost=12665.41..12665.68 rows=5 width=4) (actual time=55.537..69.892 rows=1473 loops=1) -> SetOp Intersect (cost=12665.41..12665.64 rows=5 width=4) (actual time=55.535..69.075 rows=1473 loops=1) -> Sort (cost=12665.41..12665.52 rows=46 width=4) (actual time=55.528..60.439 rows=17892 loops=1) Sort Key: "*SELECT* 1".context_key Sort Method: quicksort Memory: 1607kB -> Append (cost=0.00..12664.14 rows=46 width=4) (actual time=0.040..42.014 rows=17892 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..6332.07 rows=23 width=4) (actual time=0.038..6.953 rows=3583 loops=1) -> Nested Loop (cost=0.00..6331.84 rows=23 width=4) (actual time=0.038..5.129 rows=3583 loops=1) -> Index Scan using words_word on words (cost=0.00..2.22 rows=1 width=4) (actual time=0.019..0.020 rows=1 loops=1) Index Cond: ((word)::text = 'insider'::text) -> Index Scan using article_words_wc on article_words (cost=0.00..6227.18 rows=8195 width=8) (actual time=0.015..3.292 rows=3583 loops=1) Index Cond: (public.article_words.word_key = public.words.word_key) -> Subquery Scan "*SELECT* 2" (cost=0.00..6332.07 rows=23 width=4) (actual time=0.037..27.131 rows=14309 loops=1) -> Nested Loop (cost=0.00..6331.84 rows=23 width=4) (actual time=0.036..19.717 rows=14309 loops=1) -> Index Scan using words_word on words (cost=0.00..2.22 rows=1 width=4) (actual time=0.016..0.018 rows=1 loops=1) Index Cond: ((word)::text = 'trading'::text) -> Index Scan using article_words_wc on article_words (cost=0.00..6227.18 rows=8195 width=8) (actual time=0.017..12.360 rows=14309 loops=1) Index Cond: (public.article_words.word_key = public.words.word_key) -> Index Scan using contexts_pkey on contexts (cost=0.00..1.89 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=1473) Index Cond: (contexts.context_key = "IN_subquery".context_key) -> Hash (cost=59848.83..59848.83 rows=1124834 width=4) (actual time=220.171..220.171 rows=48207 loops=1) -> Nested Loop (cost=0.00..59848.83 rows=1124834 width=4) (actual time=0.029..175.278 rows=48207 loops=1) -> Seq Scan on bp_categories (cost=0.00..1231.49 rows=16669 width=4) (actual time=0.015..23.290 rows=16669 loops=1) Filter: (lower(category) = 'law'::text) -> Index Scan using virtual_ancestor_key_idx on virtual_ancestors a (cost=0.00..2.57 rows=76 width=8) (actual time=0.004..0.007 rows=3 loops=16669) Index Cond: (a.ancestor_key = bp_categories.context_key) -> Index Scan using article_key_idx on articles (cost=0.00..1.89 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=649) Index Cond: (articles.context_key = contexts.context_key) -> Index Scan using matview_82034_ck on matview_82034 (cost=0.00..1.88 rows=1 width=4) (actual time=0.151..0.152 rows=1 loops=649) Index Cond: (matview_82034.context_key = articles.context_key) Total runtime: 437.766 ms (36 rows) preproduction-20091214=# EXPLAIN ANALYZE SELECT contexts.context_key FROM articles JOIN contexts using (context_key) WHERE articles.context_key IN ( SELECT contexts.context_key FROM contexts JOIN matview_82034 ON (contexts.context_key=matview_82034.context_key) WHERE contexts.context_key IN (SELECT context_key FROM article_words JOIN words using (word_key) WHERE word = 'insider' INTERSECT SELECT context_key FROM article_words JOIN words using (word_key) WHERE word = 'trading') AND contexts.context_key IN (SELECT a.context_key FROM virtual_ancestors a JOIN bp_categories ON (a.ancestor_key = bp_categories.context_key) WHERE lower(bp_categories.category) = 'law') ) AND articles.indexed; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=95378.52..95391.87 rows=2 width=4) (actual time=332.545..341.194 rows=546 loops=1) -> Nested Loop (cost=95378.52..95388.07 rows=2 width=20) (actual time=332.531..337.526 rows=546 loops=1) -> HashAggregate (cost=95378.52..95378.57 rows=5 width=16) (actual time=332.514..332.830 rows=648 loops=1) -> Nested Loop (cost=90968.66..95378.51 rows=5 width=16) (actual time=285.869..331.943 rows=648 loops=1) -> Hash IN Join (cost=90968.66..95374.73 rows=2 width=12) (actual time=285.858..327.625 rows=649 loops=1) Hash Cond: (public.contexts.context_key = a.context_key) -> Nested Loop (cost=12665.41..12675.20 rows=5 width=8) (actual time=55.625..79.734 rows=1473 loops=1) -> Subquery Scan "IN_subquery" (cost=12665.41..12665.68 rows=5 width=4) (actual time=55.602..69.942 rows=1473 loops=1) -> SetOp Intersect (cost=12665.41..12665.64 rows=5 width=4) (actual time=55.601..69.115 rows=1473 loops=1) -> Sort (cost=12665.41..12665.52 rows=46 width=4) (actual time=55.593..60.422 rows=17892 loops=1) Sort Key: "*SELECT* 1".context_key Sort Method: quicksort Memory: 1607kB -> Append (cost=0.00..12664.14 rows=46 width=4) (actual time=0.039..41.913 rows=17892 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..6332.07 rows=23 width=4) (actual time=0.039..6.911 rows=3583 loops=1) -> Nested Loop (cost=0.00..6331.84 rows=23 width=4) (actual time=0.037..5.043 rows=3583 loops=1) -> Index Scan using words_word on words (cost=0.00..2.22 rows=1 width=4) (actual time=0.018..0.019 rows=1 loops=1) Index Cond: ((word)::text = 'insider'::text) -> Index Scan using article_words_wc on article_words (cost=0.00..6227.18 rows=8195 width=8) (actual time=0.015..3.202 rows=3583 loops=1) Index Cond: (public.article_words.word_key = public.words.word_key) -> Subquery Scan "*SELECT* 2" (cost=0.00..6332.07 rows=23 width=4) (actual time=0.035..27.132 rows=14309 loops=1) -> Nested Loop (cost=0.00..6331.84 rows=23 width=4) (actual time=0.034..20.008 rows=14309 loops=1) -> Index Scan using words_word on words (cost=0.00..2.22 rows=1 width=4) (actual time=0.014..0.015 rows=1 loops=1) Index Cond: ((word)::text = 'trading'::text) -> Index Scan using article_words_wc on article_words (cost=0.00..6227.18 rows=8195 width=8) (actual time=0.018..12.502 rows=14309 loops=1) Index Cond: (public.article_words.word_key = public.words.word_key) -> Index Scan using contexts_pkey on contexts (cost=0.00..1.89 rows=1 width=4) (actual time=0.005..0.005 rows=1 loops=1473) Index Cond: (public.contexts.context_key = "IN_subquery".context_key) -> Hash (cost=59848.83..59848.83 rows=1124834 width=4) (actual time=221.465..221.465 rows=48207 loops=1) -> Nested Loop (cost=0.00..59848.83 rows=1124834 width=4) (actual time=0.030..174.941 rows=48207 loops=1) -> Seq Scan on bp_categories (cost=0.00..1231.49 rows=16669 width=4) (actual time=0.015..23.377 rows=16669 loops=1) Filter: (lower(category) = 'law'::text) -> Index Scan using virtual_ancestor_key_idx on virtual_ancestors a (cost=0.00..2.57 rows=76 width=8) (actual time=0.004..0.007 rows=3 loops=16669) Index Cond: (a.ancestor_key = bp_categories.context_key) -> Index Scan using matview_82034_ck on matview_82034 (cost=0.00..1.88 rows=1 width=4) (actual time=0.005..0.006 rows=1 loops=649) Index Cond: (matview_82034.context_key = public.contexts.context_key) -> Index Scan using article_key_idx on articles (cost=0.00..1.89 rows=1 width=4) (actual time=0.006..0.006 rows=1 loops=648) Index Cond: (articles.context_key = public.contexts.context_key) Filter: articles.indexed -> Index Scan using contexts_pkey on contexts (cost=0.00..1.89 rows=1 width=4) (actual time=0.005..0.006 rows=1 loops=546) Index Cond: (public.contexts.context_key = articles.context_key) Total runtime: 341.727 ms (41 rows) preproduction-20091214=# select count(*),indexed from articles group by indexed; count | indexed --------+--------- 499730 | f 341930 | t
-- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance