I am struggeling with the following query which fetches a random subset
of 200 questions that matches certain tags within certain languages.
However, the query takes forever to evaluate, even though I have a
"limit 200" appended. Any ideas on how to optimize it?
QUERY: ================================================
SELECT distinct q.question_id
FROM question_tags qt, questions q
WHERE q.question_id = qt.question_id
AND q.STATUS = 1
AND not q.author_id = 105
AND ((qt.language_id = 5 and qt.tag_id in
(1101,917,734,550,367,183)) or (qt.language_id = 4 and qt.tag_id in
(856,428)) or (qt.language_id =
3 and qt.tag_id in (1156,1028,899,771,642,514,385,257,128)) or
(qt.language_id = 2 and qt.tag_id in
(1193,1101,1009,917,826,734,642,550,458,367,275,183,91)))
and q.question_id not in (413)
LIMIT 200
EXPLAIN ANALYZE: =========================================
Limit (cost=1.50..1267.27 rows=200 width=4) (actual
time=278.169..880.934 rows=200 loops=1)
-> Unique (cost=1.50..317614.50 rows=50185 width=4) (actual
time=278.165..880.843 rows=200 loops=1)
-> Merge Join (cost=1.50..317489.04 rows=50185 width=4)
(actual time=278.162..880.579 rows=441 loops=1)
Merge Cond: (qt.question_id = q.question_id)
-> Index Scan using question_tags_question_id on
question_tags qt (cost=0.00..301256.96 rows=82051 width=4) (actual
time=24.171..146.811 rows=6067 loops=1)
Filter: (((language_id = 5) AND (tag_id = ANY
('{1101,917,734,550,367,183}'::integer[]))) OR ((language_id = 4) AND
(tag_id = ANY ('{856,428}'::integer[]))) OR ((language_id = 3) AND
(tag_id = ANY ('{1156,1028,899,771,642,514,385,257,128}'::integer[])))
OR ((language_id = 2) AND (tag_id = ANY
('{1193,1101,1009,917,826,734,642,550,458,367,275,183,91}'::integer[]))))
-> Index Scan using questions_pkey on questions q
(cost=0.00..15464.12 rows=83488 width=4) (actual time=222.956..731.737
rows=1000 loops=1)
Filter: ((q.author_id <> 105) AND (q.question_id
<> 413) AND (q.status = 1))
Total runtime: 881.152 ms
(9 rows)
An index on (language_id,tag_id) should be the first thing to try.
Or perhaps even (status,language_id,tag_id) or (language_id, tad_id,
status) (but that depends on the stats on "status" column).
An index on author_id will probably not be useful for this particular
query because your condition is "author_id != constant".
Also CLUSTER question_tags on (language_id, tad_id).
What is the database size versus RAM ? You must have a hell of a lot of
questions to make this slow... (or bloat ?)
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance