Re: cluster index on a table

"phb07@xxxxxxxxxxxx" <phb07@xxxxxxxxxxxx> · Fri, 17 Jul 2009 15:25:52 +0200

Hi all,

>On Wed, Jul 15, 2009 at 10:36 PM, Scott Marlowe <scott.marlowe@xxxxxxxxx> 
wrote:

I'd 
  love to see it.
>

> +1 for index organized tables 
>

>--Scott

+1 also for me...

I am currently working for a large customer who is migrating his main 
application towards PostgreSQL, this application currently using DB2 
and RFM-II (a RDBMS ued on Bull GCOS 8 mainframes). With both RDBMS, 
"cluster index" are used and data rows are stored taking into account 
these indexes. The benefits are :
- a good performance level, especially for batch chains that more or 
less "scan" a lot of large tables,
- and table reorganisations remain not too frequent (about once a 
month).
To keep a good performance level with PostgreSQL, I expect that we will 
need more frequent reorganisation operations, with the drawbacks this 
generates for the production schedules. This is one of the very few 
regressions we need to address (or may be the only one).

Despite my currently limited knowledge of the postgres internals, I don't 
see why it should be difficult to simply adapt the logic used to determine the 
data row location at insert time, using something like :
- read the cluster index to find the tid of the row having the key 
value just less than the key value of the row to insert,
- if there is place enough in this same page (due to the use of FILLFACTOR 
or previous row deletion), use it,
- else use the first available place using fsm.
This doesn't change anything on MVCC mechanism, doesn't change index 
structure and management, and doesn't require data row move.
This doesn't not ensure that all rows are allways in the "right" order but 
if the FILLFACTOR are correctly set, most rows are well stored, requiring 
less reorganisation.
But I probably miss something ;-)

Regards. Philippe Beaudoin.