So as the op, back to the original posting.... In the real world, what should I do? Does it make sense to pull the "AND articles.indexed" clause into an outer query? Will that query simply perform poorly on other arbitrary combinations of words? I'm happy to test any given query against the same set of servers. If it involves a persistent change it has to run on a test server). For example, the Robert Haas method: # ... Total runtime: 254207.857 ms # ALTER TABLE article_words ALTER COLUMN word_key SET STATISTICS 1000; # ANALYZE VERBOSE article_words INFO: analyzing "public.article_words" INFO: "article_words": scanned 300000 of 1342374 pages, containing 64534899 live rows and 3264839 dead rows; 300000 rows in sample, 288766568 estimated total rows ANALYZE # ... Total runtime: 200591.751 ms # ALTER TABLE article_words ALTER COLUMN word_key SET STATISTICS 50; # ANALYZE VERBOSE article_words # ... Total runtime: 201204.972 ms Sadly, it made essentially zero difference. Attached. |
preproduction-20091214=# EXPLAIN ANALYZE SELECT contexts.context_key FROM contexts JOIN articles ON (articles.context_key=contexts.context_key) JOIN matview_82034 ON (contexts.context_key=matview_82034.context_key) WHERE contexts.context_key IN (SELECT context_key FROM article_words JOIN words using (word_key) WHERE word = 'insider' INTERSECT SELECT context_key FROM article_words JOIN words using (word_key) WHERE word = 'trading') AND contexts.context_key IN (SELECT a.context_key FROM virtual_ancestors a JOIN bp_categories ON (a.ancestor_key = bp_categories.context_key) WHERE lower(bp_categories.category) = 'law') AND articles.indexed; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=12861.91..62150.21 rows=2 width=4) (actual time=136.789..200744.455 rows=546 loops=1) -> Nested Loop IN Join (cost=12861.91..62148.32 rows=1 width=16) (actual time=136.777..200737.004 rows=546 loops=1) Join Filter: (a.context_key = articles.context_key) -> Nested Loop (cost=12861.91..12875.48 rows=2 width=12) (actual time=55.674..116.443 rows=1306 loops=1) -> Nested Loop (cost=12861.91..12871.68 rows=2 width=8) (actual time=55.662..97.863 rows=1306 loops=1) -> Subquery Scan "IN_subquery" (cost=12861.91..12862.18 rows=5 width=4) (actual time=55.639..75.777 rows=1473 loops=1) -> SetOp Intersect (cost=12861.91..12862.14 rows=5 width=4) (actual time=55.638..73.724 rows=1473 loops=1) -> Sort (cost=12861.91..12862.02 rows=46 width=4) (actual time=55.631..62.140 rows=17892 loops=1) Sort Key: "*SELECT* 1".context_key Sort Method: quicksort Memory: 1607kB -> Append (cost=0.00..12860.63 rows=46 width=4) (actual time=0.040..42.026 rows=17892 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..6430.32 rows=23 width=4) (actual time=0.039..6.909 rows=3583 loops=1) -> Nested Loop (cost=0.00..6430.09 rows=23 width=4) (actual time=0.038..5.110 rows=3583 loops=1) -> Index Scan using words_word on words (cost=0.00..2.22 rows=1 width=4) (actual time=0.019..0.020 rows=1 loops=1) Index Cond: ((word)::text = 'insider'::text) -> Index Scan using article_words_wc on article_words (cost=0.00..6323.81 rows=8325 width=8) (actual time=0.015..3.243 rows=3583 loops=1) Index Cond: (public.article_words.word_key = public.words.word_key) -> Subquery Scan "*SELECT* 2" (cost=0.00..6430.32 rows=23 width=4) (actual time=0.036..27.166 rows=14309 loops=1) -> Nested Loop (cost=0.00..6430.09 rows=23 width=4) (actual time=0.035..20.037 rows=14309 loops=1) -> Index Scan using words_word on words (cost=0.00..2.22 rows=1 width=4) (actual time=0.014..0.015 rows=1 loops=1) Index Cond: ((word)::text = 'trading'::text) -> Index Scan using article_words_wc on article_words (cost=0.00..6323.81 rows=8325 width=8) (actual time=0.017..12.618 rows=14309 loops=1) Index Cond: (public.article_words.word_key = public.words.word_key) -> Index Scan using article_key_idx on articles (cost=0.00..1.89 rows=1 width=4) (actual time=0.011..0.012 rows=1 loops=1473) Index Cond: (articles.context_key = "IN_subquery".context_key) Filter: articles.indexed -> Index Scan using contexts_pkey on contexts (cost=0.00..1.89 rows=1 width=4) (actual time=0.010..0.011 rows=1 loops=1306) Index Cond: (contexts.context_key = articles.context_key) -> Nested Loop (cost=0.00..59848.83 rows=1124834 width=4) (actual time=0.012..142.436 rows=39193 loops=1306) -> Seq Scan on bp_categories (cost=0.00..1231.49 rows=16669 width=4) (actual time=0.006..19.730 rows=14552 loops=1306) Filter: (lower(category) = 'law'::text) -> Index Scan using virtual_ancestor_key_idx on virtual_ancestors a (cost=0.00..2.57 rows=76 width=8) (actual time=0.004..0.006 rows=3 loops=19004304) Index Cond: (a.ancestor_key = bp_categories.context_key) -> Index Scan using matview_82034_ck on matview_82034 (cost=0.00..1.88 rows=1 width=4) (actual time=0.010..0.011 rows=1 loops=546) Index Cond: (matview_82034.context_key = articles.context_key) Total runtime: 200745.372 ms (36 rows) preproduction-20091214=# ALTER TABLE article_words ALTER COLUMN word_key SET STATISTICS 1000; ALTER TABLE preproduction-20091214=# ANALYZE VERBOSE article_words; INFO: analyzing "public.article_words" INFO: "article_words": scanned 300000 of 1342374 pages, containing 64534899 live rows and 3264839 dead rows; 300000 rows in sample, 288766568 estimated total rows ANALYZE preproduction-20091214=# EXPLAIN ANALYZE SELECT contexts.context_key FROM contexts JOIN articles ON (articles.context_key=contexts.context_key) JOIN matview_82034 ON (contexts.context_key=matview_82034.context_key) WHERE contexts.context_key IN (SELECT context_key FROM article_words JOIN words using (word_key) WHERE word = 'insider' INTERSECT SELECT context_key FROM article_words JOIN words using (word_key) WHERE word = 'trading') AND contexts.context_key IN (SELECT a.context_key FROM virtual_ancestors a JOIN bp_categories ON (a.ancestor_key = bp_categories.context_key) WHERE lower(bp_categories.category) = 'law') AND articles.indexed; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=5120.81..54409.11 rows=2 width=4) (actual time=136.666..200590.858 rows=546 loops=1) -> Nested Loop IN Join (cost=5120.81..54407.22 rows=1 width=16) (actual time=136.654..200583.588 rows=546 loops=1) Join Filter: (a.context_key = articles.context_key) -> Nested Loop (cost=5120.81..5134.38 rows=2 width=12) (actual time=55.577..116.349 rows=1306 loops=1) -> Nested Loop (cost=5120.81..5130.58 rows=2 width=8) (actual time=55.563..97.425 rows=1306 loops=1) -> Subquery Scan "IN_subquery" (cost=5120.81..5121.08 rows=5 width=4) (actual time=55.500..75.615 rows=1473 loops=1) -> SetOp Intersect (cost=5120.81..5121.04 rows=5 width=4) (actual time=55.499..73.464 rows=1473 loops=1) -> Sort (cost=5120.81..5120.92 rows=46 width=4) (actual time=55.491..62.078 rows=17892 loops=1) Sort Key: "*SELECT* 1".context_key Sort Method: quicksort Memory: 1607kB -> Append (cost=0.00..5119.54 rows=46 width=4) (actual time=0.041..42.203 rows=17892 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..2559.77 rows=23 width=4) (actual time=0.040..7.040 rows=3583 loops=1) -> Nested Loop (cost=0.00..2559.54 rows=23 width=4) (actual time=0.039..5.188 rows=3583 loops=1) -> Index Scan using words_word on words (cost=0.00..2.22 rows=1 width=4) (actual time=0.018..0.019 rows=1 loops=1) Index Cond: ((word)::text = 'insider'::text) -> Index Scan using article_words_wc on article_words (cost=0.00..2516.17 rows=3292 width=8) (actual time=0.017..3.299 rows=3583 loops=1) Index Cond: (public.article_words.word_key = public.words.word_key) -> Subquery Scan "*SELECT* 2" (cost=0.00..2559.77 rows=23 width=4) (actual time=0.036..27.282 rows=14309 loops=1) -> Nested Loop (cost=0.00..2559.54 rows=23 width=4) (actual time=0.035..19.972 rows=14309 loops=1) -> Index Scan using words_word on words (cost=0.00..2.22 rows=1 width=4) (actual time=0.014..0.016 rows=1 loops=1) Index Cond: ((word)::text = 'trading'::text) -> Index Scan using article_words_wc on article_words (cost=0.00..2516.17 rows=3292 width=8) (actual time=0.018..12.545 rows=14309 loops=1) Index Cond: (public.article_words.word_key = public.words.word_key) -> Index Scan using article_key_idx on articles (cost=0.00..1.89 rows=1 width=4) (actual time=0.011..0.012 rows=1 loops=1473) Index Cond: (articles.context_key = "IN_subquery".context_key) Filter: articles.indexed -> Index Scan using contexts_pkey on contexts (cost=0.00..1.89 rows=1 width=4) (actual time=0.009..0.011 rows=1 loops=1306) Index Cond: (contexts.context_key = articles.context_key) -> Nested Loop (cost=0.00..59848.83 rows=1124834 width=4) (actual time=0.012..142.295 rows=39193 loops=1306) -> Seq Scan on bp_categories (cost=0.00..1231.49 rows=16669 width=4) (actual time=0.006..19.743 rows=14552 loops=1306) Filter: (lower(category) = 'law'::text) -> Index Scan using virtual_ancestor_key_idx on virtual_ancestors a (cost=0.00..2.57 rows=76 width=8) (actual time=0.004..0.006 rows=3 loops=19004304) Index Cond: (a.ancestor_key = bp_categories.context_key) -> Index Scan using matview_82034_ck on matview_82034 (cost=0.00..1.88 rows=1 width=4) (actual time=0.010..0.010 rows=1 loops=546) Index Cond: (matview_82034.context_key = articles.context_key) Total runtime: 200591.751 ms (36 rows)
-- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance