Re: GIN index always doing Re-check condition, postgres 9.1

Andrey Osenenko <andrey.osenenko@xxxxxxxxx> · Mon, 2 Nov 2015 11:19:22 +0300

Thank you.

That's really sad news. This means that even though 
there is an index that lets you find rows you want almost immediately, 
to retrieve primary keys, you still have to do a lot of disk io.

I
 created a new table that contains only primary key and tsvector value, 
and (at least that's how I'm interpreting it) since there is less data 
to read per row, it returns same results about 2 times as quickly (I 
restarted computer to make sure nothing is in memory).

Original table:
Bitmap Heap Scan on fulldata  (cost=266.79..39162.57 rows=23069 width=8) (actual time=113.472..18368.769 rows=23132 loops=1)
  Recheck Cond: (to_tsvector('russian'::regconfig, plaintext) @@ '''москв'''::tsquery)
  Buffers: shared hit=1 read=13114

 ->  Bitmap Index Scan on fulldata_plaintext_idx  (cost=0.00..261.02 
rows=23069 width=0) (actual time=90.859..90.859 rows=23132 loops=1)
        Index Cond: (to_tsvector('russian'::regconfig, plaintext) @@ '''москв'''::tsquery)
        Buffers: shared hit=1 read=23
Total runtime: 18425.265 ms

Table with only key and vector:
Bitmap Heap Scan on fts  (cost=273.67..27903.61 rows=23441 width=8) (actual time=219.896..10095.159 rows=23132 loops=1)
  Recheck Cond: (vector @@ '''москв'''::tsquery)
  Buffers: shared hit=1 read=10877

 ->  Bitmap Index Scan on fts_vector_idx  (cost=0.00..267.81 
rows=23441 width=0) (actual time=204.631..204.631 rows=23132 loops=1)
        Index Cond: (vector @@ '''москв'''::tsquery)
        Buffers: shared hit=1 read=23
Total runtime: 10103.858 ms

It
 also looks like if there was a way to create a table with just primary 
key and add an index to it that indexes data from another table, it 
would work much, much faster since there would be very little to read 
from disk after index lookup. But looks like there isn't.

So am I
 correct in assumption that as the amount of rows grows, query times for
 rows that are not in memory (and considering how many of them there 
are, most won't be) will grow linearly?

On Mon, Nov 2, 2015 at 11:14 AM, Andrey Osenenko <andrey.osenenko@xxxxxxxxx> wrote:
Thank you.

That's really sad news. This means that even though there is an index that lets you find rows you want almost immediately, to retrieve primary keys, you still have to do a lot of disk io.

I created a new table that contains only primary key and tsvector value, and (at least that's how I'm interpreting it) since there is less data to read per row, it returns same results about 2 times as quickly (I restarted computer to make sure nothing is in memory).

Original table:
Bitmap Heap Scan on fulldata  (cost=266.79..39162.57 rows=23069 width=8) (actual time=113.472..18368.769 rows=23132 loops=1)
  Recheck Cond: (to_tsvector('russian'::regconfig, plaintext) @@ '''москв'''::tsquery)
  Buffers: shared hit=1 read=13114
  ->  Bitmap Index Scan on fulldata_plaintext_idx  (cost=0.00..261.02 rows=23069 width=0) (actual time=90.859..90.859 rows=23132 loops=1)
        Index Cond: (to_tsvector('russian'::regconfig, plaintext) @@ '''москв'''::tsquery)
        Buffers: shared hit=1 read=23
Total runtime: 18425.265 ms

Table with only key and vector:
Bitmap Heap Scan on fts  (cost=273.67..27903.61 rows=23441 width=8) (actual time=219.896..10095.159 rows=23132 loops=1)
  Recheck Cond: (vector @@ '''москв'''::tsquery)
  Buffers: shared hit=1 read=10877
  ->  Bitmap Index Scan on fts_vector_idx  (cost=0.00..267.81 rows=23441 width=0) (actual time=204.631..204.631 rows=23132 loops=1)
        Index Cond: (vector @@ '''москв'''::tsquery)
        Buffers: shared hit=1 read=23
Total runtime: 10103.858 ms

It also looks like if there was a way to create a table with just primary key and add an index to it that indexes data from another table, it would work much, much faster since there would be very little to read from disk after index lookup. But looks like there isn't.

So am I correct in assumption that as the amount of rows grows, query times for rows that are not in memory (and considering how many of them there are, most won't be) will grow linearly?