Re: Slow search.. quite clueless

Yonatan Ben-Nes <da@xxxxxxxxxxxx> · Mon, 26 Sep 2005 19:56:33 +0200

Oleg Bartunov wrote:
contrib/tsearch2 ( 
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/ )
might works for you. It might because performance depends on cardinality 
of your keywords.

     Oleg
On Tue, 20 Sep 2005, Yonatan Ben-Nes wrote:

Hi all,

Im building a site where the users can search for products with up to 
4 diffrent keywords which all MUST match to each product which found 
as a result to the search.

I got 2 tables (which are relevant to the issue :)), one is the 
product table (5 million rows) and the other is the keyword table 
which hold the keywords of each product (60 million rows).

The scheme of the tables is as follows:

                     Table "public.product"
          Column           |     Type      |      Modifiers
----------------------------+---------------+---------------------
product_id                 | text          | not null
product_name               | text          | not null
retail_price               | numeric(10,2) | not null
etc...
Indexes:
   "product_product_id_key" UNIQUE, btree (product_id)

        Table "public.keyword"
  Column    |     Type      | Modifiers
-------------+---------------+-----------
product_id  | text          | not null
keyword     | text          | not null
Indexes:
   "keyword_keyword" btree (keyword)

The best query which I succeded to do till now is adding the keyword 
table for each keyword searched for example if someone search for 
"belt" & "black" & "pants" it will create the following query:

poweraise.com=# EXPLAIN ANALYZE SELECT 
product_id,product_name,product_image_url,short_product_description,long_product_description,discount,discount_type,sale_price,retail_price 
FROM product INNER JOIN keyword t1 USING(product_id) INNER JOIN 
keyword t2 USING(product_id) INNER JOIN keyword t3 USING(product_id) 
WHERE t1.keyword='belt' AND t2.keyword='black' AND t3.keyword='pants' 
LIMIT 13;

      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------- 

Limit  (cost=37734.15..39957.20 rows=13 width=578) (actual 
time=969.798..1520.354 rows=6 loops=1)
  ->  Hash Join  (cost=37734.15..3754162.82 rows=21733 width=578) 
(actual time=969.794..1520.337 rows=6 loops=1)
        Hash Cond: ("outer".product_id = "inner".product_id)
        ->  Nested Loop  (cost=18867.07..2858707.34 rows=55309 
width=612) (actual time=82.266..1474.018 rows=156 loops=1)
              ->  Hash Join  (cost=18867.07..2581181.09 rows=55309 
width=34) (actual time=82.170..1462.104 rows=156 loops=1)
                    Hash Cond: ("outer".product_id = "inner".product_id)
                    ->  Index Scan using keyword_keyword on keyword t2 
(cost=0.00..331244.43 rows=140771 width=17) (actual 
time=0.033..1307.167 rows=109007 loops=1)
                          Index Cond: (keyword = 'black'::text)
                    ->  Hash  (cost=18851.23..18851.23 rows=6337 
width=17) (actual time=16.145..16.145 rows=0 loops=1)
                          ->  Index Scan using keyword_keyword on 
keyword t1 (cost=0.00..18851.23 rows=6337 width=17) (actual 
time=0.067..11.050 rows=3294 loops=1)
                                Index Cond: (keyword = 'belt'::text)
              ->  Index Scan using product_product_id_key on product 
(cost=0.00..5.01 rows=1 width=578) (actual time=0.058..0.060 rows=1 
loops=156)
                    Index Cond: (product.product_id = "outer".product_id)
        ->  Hash  (cost=18851.23..18851.23 rows=6337 width=17) (actual 
time=42.863..42.863 rows=0 loops=1)
              ->  Index Scan using keyword_keyword on keyword t3 
(cost=0.00..18851.23 rows=6337 width=17) (actual time=0.073..36.120 
rows=3932 loops=1)
                    Index Cond: (keyword = 'pants'::text)
Total runtime: 1521.441 ms
(17 rows)

Sometimes the query work fast even for 3 keywords but that doesnt help 
me if at other times it take ages....

Now to find a result for 1 keyword its really flying so I also tried 
to make 3 queries and do INTERSECT between them but it was found out 
to be extremly slow...

Whats make this query slow as far as I understand is all the merging 
between the results of each table... I tried to divide the keyword 
table into lots of keywords table which each hold keywords which start 
only with a specific letter, it did improve the speeds but not in a 
real significant way.. tried clusters,indexes,SET STATISTICS,WITHOUT 
OIDS on the keyword table and what not.. im quite clueless...

Actually I even started to look on other solutions and maybe you can 
say something about them also.. maybe they can help me:
1. Omega (From the Xapian project) - http://www.xapian.org/
2. mnoGoSearch - http://www.mnogosearch.org/doc.html
3. Swish-e - http://swish-e.org/index.html

To add on everything I want at the end to be able to ORDER BY the 
results like order the product by price, but im less concerned about 
that cause I saw that with cluster I can do it without any extra 
overhead.

Thanks alot in advance,
Yonatan Ben-Nes

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
     subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
     message can get through to the mailing list cleanly

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

Hi again everyone,

Oleg I tried tsearch2 and happily it does work wonderfully for me 
returning results extremly fast and actually its working even better 
then I wanted with all of those neat features like: lexem, weight & stop 
words.

I got only one problem which is when I want the results to be ordered by 
a diffrent field (like print INT field) it takes quite alot of time for 
it to do it if the query can return lots of results (for example search 
for the word "computer") and thats even if I limit the results.
The best way to improve its speed for such quereies (that I've found...) 
is to create an index on the field which I want to order by and using it 
CLUSTER the table, after the clustering I drop the the index so it won't 
be used when I run queries with ORDER BY on that field, that seem to 
improve the speed, if anyone got a better idea ill be glad to hear it.

Anyway thanks alot everyone!
  Ben-Nes Yonatan

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend