Re: new index type with clustering in mind.

Martijn van Oosterhout <kleptog@xxxxxxxxx> · Sat, 24 May 2014 23:16:58 +0200

On Sat, May 24, 2014 at 05:58:37PM +0100, Jack Douglas wrote:
> Would the following be practical to implement:

> A btree-like index type that points to *pages* rather than individual rows.
> Ie if there are many rows in a page with the same data (in the indexed
> columns), only one index entry will exist. In its normal use case, this
> index would be much smaller than a regular index on the same columns which
> would contain one entry for each individual row.

> To reduce complexity (eg MVCC/snapshot related issues), index entries would
> be added when a row is inserted, but they would not be removed when the row
> is updated/deleted (or when an insert is rolled back): this would cause
> index bloat over time in volatile tables but this would be acceptable for
> the use case I have in mind. So in essence, an entry in the index would
> indicate that there *may* be matching rows in the page, not that there
> actually are.

It's an interesting idea, but, how can you *ever* delete index entries?
I.e. is there a way to maintain the index without rebuilding it
regularly?

Maybe there's something you could do with tracking all the entries that
point to one page or something, or a counter.  Because really, the fact
that the item pointer in a btree index includes the item number is only
really needed for deletion.  Postgres always has to read in the whole
page anyway, so if you can find a way around that it might be an
interesting improvement.

Mind you, hash indexes could get this almost free, except they're not
crash safe.

Have a nice day,
-- 
Martijn van Oosterhout   <kleptog@xxxxxxxxx>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer
Attachment:
signature.asc

Description: Digital signature