Re: GIN index always doing Re-check condition, postgres 9.1

Jeff Janes <jeff.janes@xxxxxxxxx> · Sun, 1 Nov 2015 23:22:47 -0800

On Sun, Nov 1, 2015 at 10:52 PM, Andrey Osenenko
<andrey.osenenko@xxxxxxxxx> wrote:
> I have a table with roughly 300 000 rows, each row containing two large
> strings - one is news article HTML, the other is news article plaintext. The
> table has a bigint primary key.
>
> A GIN index is used to do fulltext search on the plaintext part. All I want
> to retrieve when I do fulltext search is values of primary key column.
>
> With a popular word, the amount of results from fulltext search query can be
> pretty high - a query can return 23 000 rows and some can more, and will
> return more as the database continues to grow.
>
> The problem I have is that postgres always does Re-check condition step for
> my request. That query with 23k rows takes 20 seconds to execute, and
> EXPLAIN shows that almost all of that time is spent
> re-checking condition.

Explain does not address the issue of how much time was spent doing
rechecks.  You are misinterpreting something.

> The second time I run the same query, I get results
> immediately.

That suggests the time is spent reading data from disk the first time,
not spent doing rechecks.  Rechecks do not get faster by repeated
execution, except indirectly to the extent the data has already been
pulled into memory.  But other things get faster due to that, as well.

Now those are not mutually exclusive, as doing a recheck might lead to
reading toast tables that don't need to get read at all in the absence
of rechecks.  So rechecks can lead to IO problems.  But there is no
evidence that that is the case for you.

>
> 1st run:
> Bitmap Heap Scan on fulldata  (cost=266.79..39162.57 rows=23069 width=8)
> (actual time=135.727..19499.667 rows=23132 loops=1)
>   Recheck Cond: (to_tsvector('russian'::regconfig, plaintext) @@
> '''москв'''::tsquery)

This tells you what condition will be applied to the recheck, in case
a recheck is needed due to bitmap memory overflow.  It doesn't tell
how many times, if any, that was actually done, or how much time was
spent doing it.

As far as I know, there is no way to distinguish a "lossy index"
recheck form a "lossy bitmap" recheck in version 9.1.

>   Buffers: shared hit=115 read=13000

That you needed only 13115 blocks to deliver 23069 tells me that there
is little if any recheck-driven toast table access going on.  That the
second execution was very fast tells me that there is little
rechecking at all going on, because actual rechecking is CPU
intensive.

I don't think your problem has anything to do with rechecking.  You
simply have too much data that is not in memory.  You need more
memory, or some way to keep your memory pinned with what you need.  If
you are on a RAID, you could also increase effective_io_concurrency,
which lets the bitmap scan prefetch table blocks.

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance