I have a table with roughly 300 000 rows, each row containing two large strings - one is news article HTML, the other is news article plaintext. The table has a bigint primary key.
A GIN index is used to do fulltext search on the plaintext part. All I want to retrieve when I do fulltext search is values of primary key column.The problem I have is that postgres always does Re-check condition step for my request. That query with 23k rows takes 20 seconds to execute, and EXPLAIN shows that almost all of that time is spent
What this means is that every time user does a search for some word no one searched before, he has to wait a very long time, which is unacceptable for us.
CREATE TABLE kard_md.fulldata
(
id_iu bigint NOT NULL,
url character varying NOT NULL,
original text,
edited text,
plaintext text,
date timestamp without time zone,
CONSTRAINT fulldata_pkey PRIMARY KEY (id_iu)
);
CREATE INDEX fulldata_plaintext_idx
ON kard_md.fulldata
USING gin
(to_tsvector('russian'::regconfig, plaintext));
EXPLAIN (ANALYZE, BUFFERS) select id_iu from kard_md.fulldata where to_tsvector('russian',fulldata.plaintext) @@ plainto_tsquery('russian','москва');
1st run:
Bitmap Heap Scan on fulldata (cost=266.79..39162.57 rows=23069 width=8) (actual time=135.727..19499.667 rows=23132 loops=1)
Recheck Cond: (to_tsvector('russian'::regconfig, plaintext) @@ '''москв'''::tsquery)
Buffers: shared hit=115 read=13000
-> Bitmap Index Scan on fulldata_plaintext_idx (cost=0.00..261.02 rows=23069 width=0) (actual time=104.834..104.834 rows=23132 loops=1)
Index Cond: (to_tsvector('russian'::regconfig, plaintext) @@ '''москв'''::tsquery)
Buffers: shared hit=3 read=21
Total runtime: 19512.479 ms
2nd run:
Bitmap Heap Scan on fulldata (cost=266.79..39162.57 rows=23069 width=8) (actual time=25.423..48.649 rows=23132 loops=1)
Recheck Cond: (to_tsvector('russian'::regconfig, plaintext) @@ '''москв'''::tsquery)
Buffers: shared hit=13115
-> Bitmap Index Scan on fulldata_plaintext_idx (cost=0.00..261.02 rows=23069 width=0) (actual time=18.057..18.057 rows=23132 loops=1)
Index Cond: (to_tsvector('russian'::regconfig, plaintext) @@ '''москв'''::tsquery)
Buffers: shared hit=24
Total runtime: 49.612 ms
select version()
'PostgreSQL 9.1.15, compiled by Visual C++ build 1500, 64-bit'