tsearch2 question (was: Poor performance on seq scan)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tom Lane wrote:
Only if the index is capable of disgorging the original value of the
indexed column, a fact not in evidence in general (counterexample:
polygons indexed by their bounding boxes in an r-tree).  But yeah,
it's interesting to think about applying filters at the index fetch
step for index types that can hand back full values.  This has been
discussed before --- I think we had gotten as far as speculating about
doing joins with just index values.  See eg here:
http://archives.postgresql.org/pgsql-hackers/2004-05/msg00944.php
A lot of the low-level concerns have already been dealt with in order to
support bitmap indexscans, but applying non-indexable conditions before
fetching from the heap is still not done.
To overcome this problem, I created a smaller "shadow" table:

CREATE TABLE product_search
(
 id int8 NOT NULL,
 name_desc text,
 CONSTRAINT pk_product_search PRIMARY KEY (id)
);

insert into product_search
 select
id, name || ' ' || coalesce(description,'')
 from product;


Obviously, this is almost like an index, but I need to maintain it manually. I'm able to search with

zeusd1=> explain analyze select id from product_search where name_desc like '%Mug%';
                                                      QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Seq Scan on product_search (cost=0.00..54693.34 rows=36487 width=8) (actual time=20.036..2541.971 rows=91399 loops=1)
  Filter: (name_desc ~~ '%Mug%'::text)
Total runtime: 2581.272 ms
(3 rows)

The total runtime remains below 3 sec in all cases. Of course I still need to join the main table to the result:

explain analyze select s.id,p.name from product_search s inner join product p on (p.id = s.id) where s.name_desc like '%Tiffany%'

QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=0.00..55042.84 rows=58 width=40) (actual time=164.437..3982.610 rows=117 loops=1) -> Seq Scan on product_search s (cost=0.00..54693.34 rows=58 width=8) (actual time=103.651..2717.914 rows=117 loops=1)
        Filter: (name_desc ~~ '%Tiffany%'::text)
-> Index Scan using pk_product_id on product p (cost=0.00..6.01 rows=1 width=40) (actual time=10.793..10.796 rows=1 loops=117)
        Index Cond: (p.id = "outer".id)
Total runtime: 4007.283 ms
(6 rows)

Took 4 seconds. Awesome! With the original table, it used to be one or two minutes!

Now you can ask, why am I not using tsearch2 for this? Here is answer:

CREATE TABLE product_search
(
 id int8 NOT NULL,
 ts_name_desc tsvector,
 CONSTRAINT pk_product_search PRIMARY KEY (id)
);

insert into product_search
 select
id, to_tsvector(name || ' ' coalesce(description,''))
 from product;

CREATE INDEX idx_product_search_ts_name_desc ON product_search USING gist (ts_name_desc);
VACUUM product_search;

zeusd1=> explain analyze select id from product_search where ts_name_desc @@ to_tsquery('mug'); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on product_search (cost=25.19..3378.20 rows=912 width=8) (actual time=954.669..13112.009 rows=91434 loops=1)
 Filter: (ts_name_desc @@ '''mug'''::tsquery)
-> Bitmap Index Scan on idx_product_search_ts_name_desc (cost=0.00..25.19 rows=912 width=0) (actual time=932.455..932.455 rows=91436 loops=1)
       Index Cond: (ts_name_desc @@ '''mug'''::tsquery)
Total runtime: 13155.724 ms
(5 rows)

zeusd1=> explain analyze select id from product_search where ts_name_desc @@ to_tsquery('tiffany'); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on product_search (cost=25.19..3378.20 rows=912 width=8) (actual time=13151.725..13639.112 rows=76 loops=1)
 Filter: (ts_name_desc @@ '''tiffani'''::tsquery)
-> Bitmap Index Scan on idx_product_search_ts_name_desc (cost=0.00..25.19 rows=912 width=0) (actual time=13123.705..13123.705 rows=81 loops=1)
       Index Cond: (ts_name_desc @@ '''tiffani'''::tsquery)
Total runtime: 13639.478 ms
(5 rows)

At least 13 seconds, and the main table is not joined yet. Can anybody explain to me, why the seq scan is faster than the bitmap index? In the last example there were only 81 rows returned, but it took more than 13 seconds. :( Even if the whole table can be cached into memory (which isn't the case), the bitmap index should be much faster. Probably there is a big problem with my schema but I cannot find it. What am I doing wrong?

Thanks,

  Laszlo



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux