Re: new index type with clustering in mind.

"Jack Douglas" <jack@xxxxxxxxxxxxxxxxxxxxxxx> · Mon, 26 May 2014 22:37:04 +0100

> The discussions at PGCon pointed out that with the posting-list
compression logic added in 9.4, GIN indexes are pretty close to this
already.  Multiple items on the same heap page will typically only take one
byte of index space per item; but there is an identifiable entry, so you
don't get into these questions of when VACUUM should remove entries, and
it's not lossy so you're not forced to pay the overhead of rechecking every
entry on the linked-to page.

> Not to say that 9.4 GIN is necessarily the last word on the subject, but
it would be worth testing it out before deciding that we need something
better.  (beta1 is out.  It needs testing.  Hint hint.)

Hint taken, and first impressions are positive: the compression is very
efficient for the kind of scenario I'm imagining where the key is
deliberately chosen so that the average page has one distinct key. I have a
25Mb gin 'cluster' index on a table where an equivalent regular btree index
is 10 times as large.

So the questions are, a) is this kind of clustering broadly useful (ie not
just to me), b) how much effort will it be to implement a 'vacuum-like'
operation that scans a designated index and performs the relevant
delete/inserts to achieve this kind of clustering? And c) if it is broadly
useful and not a major implementation mountain to climb, is it something
that might be added to the todo list?

If someone can tell me how to decode a `ctid` into a page number (discarding
the row number portion - is there a better way than `
(replace(replace(ctid::text,'(','{'),')','}')::integer[])[1]`), I should be
able to show some analysis demonstrating this working, albeit inefficiently
as I'll have to scan the table itself for the page/key statistics. Would
that sort of analysis be helpful?

Kindest regards
Jack

PS It occurs to me that the btree_gin documentation page for 9.4,
http://www.postgresql.org/docs/9.4/static/btree-gin.html, might benefit from
including some mention of index compression when discussing the relative
performance of regular and gin btree indexes.