> > > To reduce complexity (eg MVCC/snapshot related issues), index entries > > > would be added when a row is inserted, but they would not be removed > > > when the row is updated/deleted (or when an insert is rolled back): > > It's an interesting idea, but, how can you *ever* delete index entries? > > I.e. is there a way to maintain the index without rebuilding it > > regularly? > > The discussions at PGCon pointed out that with the posting-list compression > logic added in 9.4, GIN indexes are pretty close to this already. Multiple > items on the same heap page will typically only take one byte of index space > per item; but there is an identifiable entry, so you don't get into these > questions of when VACUUM should remove entries, and it's not lossy so > you're not forced to pay the overhead of rechecking every entry on the > linked-to page. The README file in the source code for GIN indexes says: "Note: There is no delete operation in the key (entry) tree. The reason for this is that in our experience, the set of distinct words in a large corpus changes very slowly. This greatly simplifies the code and concurrency algorithms.". Does that mean that for the case where GIN is used as a simple replacement for btree, my initial suggestion above ("...would be added when a row is inserted, but they would not be removed...") is effectively what happens already with GIN indexes? --- I've written up a test to demonstrate the principle of this 'clustering lite': http://dba.stackexchange.com/q/66292/1396 In brief, it shows the benefit of this sort of clustering (much lower io versus unclustered, and no exclusive lock or heavy WAL generation versus current clustering implementations) with a workable way of achieving it in the current release. The basic idea is: 1. turn off autovacuum for the table 2. check each block to determine the degree of clustering 3. delete and re-insert all the rows from blocks below a clustering threshold 4. manually vacuum to free those (complete) blocks 5. repeat steps 2-4 as regularly as necessary A weakness is that is requires a full table scan is required. All that is needed to avoid the full-scan would be a way of getting `blkno` from an index-only scan, which as far as I can tell is not currently possible, is it?