Oleg Bartunov wrote:
On Tue, 20 Sep 2005, Philip Hallstrom wrote:
contrib/tsearch2 (
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/ )
might works for you. It might because performance depends on
cardinality of your keywords.
Seconded. We use tsearch2 to earch about 40,000 rows containing
manufacturer, brand, and product name and it returns a result almost
instantly. Before when we did normal SQL "manufacture LIKE ..., etc."
it would take 20-30 seconds.
One thing to check is the english.stop file which contains words to
skip (i, a, the, etc.). In our case we removed almost all of them
since one of our products is "7 up" (the drink) and it would remove
"up". Made it really hard to pull up 7 up in the results :)
we have "rewriting query support ( thesauri search)" in our todo
(http://www.sai.msu.su/~megera/oddmuse/index.cgi/todo).
-philip
Oleg
On Tue, 20 Sep 2005, Yonatan Ben-Nes wrote:
Hi all,
Im building a site where the users can search for products with up
to 4 diffrent keywords which all MUST match to each product which
found as a result to the search.
I got 2 tables (which are relevant to the issue :)), one is the
product table (5 million rows) and the other is the keyword table
which hold the keywords of each product (60 million rows).
The scheme of the tables is as follows:
Table "public.product"
Column | Type | Modifiers
----------------------------+---------------+---------------------
product_id | text | not null
product_name | text | not null
retail_price | numeric(10,2) | not null
etc...
Indexes:
"product_product_id_key" UNIQUE, btree (product_id)
Table "public.keyword"
Column | Type | Modifiers
-------------+---------------+-----------
product_id | text | not null
keyword | text | not null
Indexes:
"keyword_keyword" btree (keyword)
The best query which I succeded to do till now is adding the keyword
table for each keyword searched for example if someone search for
"belt" & "black" & "pants" it will create the following query:
poweraise.com=# EXPLAIN ANALYZE SELECT
product_id,product_name,product_image_url,short_product_description,long_product_description,discount,discount_type,sale_price,retail_price
FROM product INNER JOIN keyword t1 USING(product_id) INNER JOIN
keyword t2 USING(product_id) INNER JOIN keyword t3 USING(product_id)
WHERE t1.keyword='belt' AND t2.keyword='black' AND
t3.keyword='pants' LIMIT 13;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=37734.15..39957.20 rows=13 width=578) (actual
time=969.798..1520.354 rows=6 loops=1)
-> Hash Join (cost=37734.15..3754162.82 rows=21733 width=578)
(actual time=969.794..1520.337 rows=6 loops=1)
Hash Cond: ("outer".product_id = "inner".product_id)
-> Nested Loop (cost=18867.07..2858707.34 rows=55309
width=612) (actual time=82.266..1474.018 rows=156 loops=1)
-> Hash Join (cost=18867.07..2581181.09 rows=55309
width=34) (actual time=82.170..1462.104 rows=156 loops=1)
Hash Cond: ("outer".product_id =
"inner".product_id)
-> Index Scan using keyword_keyword on keyword
t2 (cost=0.00..331244.43 rows=140771 width=17) (actual
time=0.033..1307.167 rows=109007 loops=1)
Index Cond: (keyword = 'black'::text)
-> Hash (cost=18851.23..18851.23 rows=6337
width=17) (actual time=16.145..16.145 rows=0 loops=1)
-> Index Scan using keyword_keyword on
keyword t1 (cost=0.00..18851.23 rows=6337 width=17) (actual
time=0.067..11.050 rows=3294 loops=1)
Index Cond: (keyword = 'belt'::text)
-> Index Scan using product_product_id_key on product
(cost=0.00..5.01 rows=1 width=578) (actual time=0.058..0.060 rows=1
loops=156)
Index Cond: (product.product_id =
"outer".product_id)
-> Hash (cost=18851.23..18851.23 rows=6337 width=17)
(actual time=42.863..42.863 rows=0 loops=1)
-> Index Scan using keyword_keyword on keyword t3
(cost=0.00..18851.23 rows=6337 width=17) (actual time=0.073..36.120
rows=3932 loops=1)
Index Cond: (keyword = 'pants'::text)
Total runtime: 1521.441 ms
(17 rows)
Sometimes the query work fast even for 3 keywords but that doesnt
help me if at other times it take ages....
Now to find a result for 1 keyword its really flying so I also tried
to make 3 queries and do INTERSECT between them but it was found out
to be extremly slow...
Whats make this query slow as far as I understand is all the merging
between the results of each table... I tried to divide the keyword
table into lots of keywords table which each hold keywords which
start only with a specific letter, it did improve the speeds but not
in a real significant way.. tried clusters,indexes,SET
STATISTICS,WITHOUT OIDS on the keyword table and what not.. im quite
clueless...
Actually I even started to look on other solutions and maybe you can
say something about them also.. maybe they can help me:
1. Omega (From the Xapian project) - http://www.xapian.org/
2. mnoGoSearch - http://www.mnogosearch.org/doc.html
3. Swish-e - http://swish-e.org/index.html
To add on everything I want at the end to be able to ORDER BY the
results like order the product by price, but im less concerned about
that cause I saw that with cluster I can do it without any extra
overhead.
Thanks alot in advance,
Yonatan Ben-Nes
---------------------------(end of
broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your
message can get through to the mailing list cleanly
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@xxxxxxxxxx, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
Hi all,
First of all sorry for the delay we had a problem with out mail server...
The tsearch2 looks really promising, im starting to work with it now and
ill report what ill find.
And to Alex thanks but I tried already all of the things you recommended
and sadly it didnt help.
Thanks alot for the help everyone!
Yonatan Ben-Nes
---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend