GIN index always doing Re-check condition, postgres 9.1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have a table with roughly 300 000 rows, each row containing two large strings - one is news article HTML, the other is news article plaintext. The table has a bigint primary key.

A GIN index is used to do fulltext search on the plaintext part. All I want to retrieve when I do fulltext search is values of primary key column.

With a popular word, the amount of results from fulltext search query can be pretty high - a query can return 23 000 rows and some can more, and will return more as the database continues to grow.

The problem I have is that postgres always does Re-check condition step for my request. That query with 23k rows takes 20 seconds to execute, and EXPLAIN shows that almost all of that time is spent
re-checking condition. The second time I run the same query, I get results immediately.
What this means is that every time user does a search for some word no one searched before, he has to wait a very long time, which is unacceptable for us.

Is this happening by design, or am I doing something wrong? The way I see it, since docs say GIN indexes are lossless, the database should be able to fetch just primary key values for matching rows for me.

Here's schema, query, explain and database version:

CREATE TABLE kard_md.fulldata
(
  id_iu bigint NOT NULL,
  url character varying NOT NULL,
  original text,
  edited text,
  plaintext text,
  date timestamp without time zone,
  CONSTRAINT fulldata_pkey PRIMARY KEY (id_iu)
);

CREATE INDEX fulldata_plaintext_idx
  ON kard_md.fulldata
  USING gin
  (to_tsvector('russian'::regconfig, plaintext));


EXPLAIN (ANALYZE, BUFFERS) select id_iu from kard_md.fulldata where to_tsvector('russian',fulldata.plaintext) @@ plainto_tsquery('russian','москва');

1st run:
Bitmap Heap Scan on fulldata  (cost=266.79..39162.57 rows=23069 width=8) (actual time=135.727..19499.667 rows=23132 loops=1)
  Recheck Cond: (to_tsvector('russian'::regconfig, plaintext) @@ '''москв'''::tsquery)
  Buffers: shared hit=115 read=13000
  ->  Bitmap Index Scan on fulldata_plaintext_idx  (cost=0.00..261.02 rows=23069 width=0) (actual time=104.834..104.834 rows=23132 loops=1)
        Index Cond: (to_tsvector('russian'::regconfig, plaintext) @@ '''москв'''::tsquery)
        Buffers: shared hit=3 read=21
Total runtime: 19512.479 ms

2nd run:
Bitmap Heap Scan on fulldata  (cost=266.79..39162.57 rows=23069 width=8) (actual time=25.423..48.649 rows=23132 loops=1)
  Recheck Cond: (to_tsvector('russian'::regconfig, plaintext) @@ '''москв'''::tsquery)
  Buffers: shared hit=13115
  ->  Bitmap Index Scan on fulldata_plaintext_idx  (cost=0.00..261.02 rows=23069 width=0) (actual time=18.057..18.057 rows=23132 loops=1)
        Index Cond: (to_tsvector('russian'::regconfig, plaintext) @@ '''москв'''::tsquery)
        Buffers: shared hit=24
Total runtime: 49.612 ms

select version()
'PostgreSQL 9.1.15, compiled by Visual C++ build 1500, 64-bit'



[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux