Re: GIN index always doing Re-check condition, postgres 9.1

Jeff Janes <jeff.janes@xxxxxxxxx> · Mon, 2 Nov 2015 14:09:02 -0800

On Mon, Nov 2, 2015 at 12:19 AM, Andrey Osenenko
<andrey.osenenko@xxxxxxxxx> wrote:
>
> It also looks like if there was a way to create a table with just primary
> key and add an index to it that indexes data from another table, it would
> work much, much faster since there would be very little to read from disk
> after index lookup. But looks like there isn't.

There is a way to do this, but it is pretty gross.

You can define function which takes the primary key as input and
returns the data to index.  Mark the function as immutable, even
though it isn't.  Something like:

CREATE OR REPLACE FUNCTION public.filler_by_aid(integer)
 RETURNS text
 LANGUAGE sql
 IMMUTABLE
AS $function$ select filler::text from pgbench_accounts where aid=$1 $function$

Then create a table which has just the primary key, and create a
functional index on that table

create table foobar as select aid from pgbench_accounts;
create index on foobar (filler_by_aid(aid));

Now you can query the skinny table by reference to the data in the wide table:

explain analyze select count(*) from foobar where filler_by_aid(aid)='zebra';

Since you fibbed to PostgreSQL about the functions immutability, it is
pretty easy to get a corrupt index here.  Every time the parent is
updated, you have to be sure to delete and reinsert the primary key in
the corresponding skinny table, otherwise it will not reflect the
updated value.

What you gain in the skinny table you could very well lose with the
triggers needed to maintain it.  Not to mention the fragility.

It would be simpler if you could just force the wide data to always be
toasted, even if it is not wide enough to trigger the default toast
threshold.  You could get a skinnier table (although not quite as
skinny as one with only a single column), without having to do
unsupported hacks.  (I am assuming here, without real evidence other
than intuition, that most of your news articles are in fact coming in
under the toast threshold).

> So am I correct in assumption that as the amount of rows grows, query times
> for rows that are not in memory (and considering how many of them there are,
> most won't be) will grow linearly?

Yep.

What you really need are index only scans.  But those are not
supported by gin indexes, and given the gin index structure it seems
unlikely they will ever support index-only scans, at least not in a
way that would help you.

What are you going to do with these 23,000 primary keys once you get
them, anyway?  Perhaps you can push that analysis into the database
and gain some efficiencies there.

Or you could change your data structure.  If all you are doing is
searching for one tsvector token at a time, you could unnest ts_vector
and store it in a table like (ts_token text, id_iu bigint).  Then
build a regular btree index on (ts_token, id_iu) and get index-only
scans (once you upgrade from 9.1 to something newer)

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance