Re: 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Your Query :

SELECT contexts.context_key FROM contexts
JOIN articles ON (articles.context_key=contexts.context_key)
JOIN matview_82034 ON (contexts.context_key=matview_82034.context_key)
WHERE contexts.context_key IN
(SELECT context_key FROM article_words JOIN words using (word_key) WHERE word = 'insider'
     INTERSECT
SELECT context_key FROM article_words JOIN words using (word_key) WHERE word = 'trading')
AND contexts.context_key IN
(SELECT a.context_key FROM virtual_ancestors a JOIN bp_categories ON (a.ancestor_key = bp_categories.context_key)
         WHERE lower(bp_categories.category) = 'law') AND articles.indexed;


I guess this is some form of keyword search, like :
- search for article
- with keywords "insider" and "trading"
- and belongs to a subcategory of "law"

The way you do it is exactly the same as the way phpBB forum implements it, in the case you use a database that doesn't support full text search. It is a fallback mechanism only meant for small forums on old versions of MySQL, because it is extremely slow.

Even your faster timing (7500 ms) is extremely slow.

Option 1 :

a) Instead of building your own keywords table, use Postgres' fulltext search, which is a lot smarter about combining keywords than using INTERSECT. You can either index the entire article, or use a separate keyword field, or both.

b) If an article belongs to only one category, use an integer field. If, as is most often the case, an article can belong to several categories, use gist. When an article belongs to categories 1,2,3, set a column article_categories to the integer array {1,2,3}::INTEGER[]. Then, use a gist index on it.

You can then do a SELECT from articles (only one table) using an AND on the intersection of article_categories with an array of the required categories, and using Postgres' full text search on keywords.

This will most likely result in a Bitmap Scan, which will do the ANDing much faster than any other solution.

Alternately, you can also use keywords like category_1234, stuff everything in your keywords column, and use only Fulltext search.

You should this solution first, it works really well. When the data set becomes quite a bit larger than your RAM, it can get slow, though.

Option 2 :

Postgres' full text search is perfectly integrated and has benefits : fast, high write concurrency, etc. However full text search can be made much faster with some compromises.

For instance, I have tried Xapian : it is a lot faster than Postgres for full text search (and more powerful too), but the price you pay is
- a bit of work to integrate it
- I suggest using triggers and a Python indexer script running in the background to update the index
	- You can't SQL query it, so you need some interfacing
- updates are not concurrent (single-writer).

So, if you don't make lots of updates, Xapian may work for you. Its performance is unbelievable, even on huge datasets.

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux